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1. Introduction 


The Napalm Graphics Engine is a fifth (Voodoo Graphics, Voodoo2, Voodoo Banshee, Voodoo3, ...) 
generation 3D graphics engine based on the original SST1 architecture. Napalm incorporates all of the 
original SST1 features such as true-perspective texture mapping with advanced mipmapping and lighting, 
texture anti-aliasing, sub-pixel correction, gouraud shading, depth-buffering, alpha blending and dithering. 
Napalm also has 2 full-featured texturing units, which allow for advanced features like trilinear filtering, 
dual-texturing or bump mapping to be performed at the rate of a pixel per clock. 

. In addition to the SST1 
features, Napalm includes a VGA core, 2D graphics acceleration, and support for Intel’s AGP 4x bus. 


Features 
e SSTI baseline features with 2 texturing units. 


SST1 software compatible 
/ AGP2X / AGP 1X / PCI bus compliant 


| 128-bit VGA core 
2D acceleration 
Binary/Ternary operand raster ops 
Screen to Screen, Screen to Texture space, and Texture space to Screen Blits. 
Color space conversion YUV to RGB. 
1:N monochrome expansion 
Rendering support of 2048x2048 
© Integrated RAMDAC and PLLs. 
Bilinear video scaling 
Video in via feature connector 
Supports SGRAM and SDRAM memories 
TV out interface runs at 1|OOMHz DDR 
Video-In: 
© Operates simultaneously with TV out interface. 
Decimation 
Support for interlaced video data 
Support VMI, SAA7110 video connectors 
Triple buffers for video-in data 
Video-Out: 
Bilinear scaling zoom-in (from | to 10x magnification in increments of 0.25x) 
Decimation for zoom-out (0.25x, 0.5x, 0.75x) 
Support for stereoscopic display 
Hardware cursor 
oo buffer frame buffers for video refresh 
DDC support for monitor communication 
| mode support 


Chroma-keying for video underlying and overlaying 
Overlay windows (for 3D and motion video) 
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1.1 Resolutions 


256/256K 


Mode Type # of Colors Native Resolution |Alpha Format 


an 
Graphics 256/256K 640x400 80x25 
Graphics 256/256K 640x480 80x30 
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2. Performance 


2.1 2D Performance 


Estimated triangle performance. 


8-bits per pixel, 1024x768 resolution (linear) 


2.2 3D Performance 
16-bits per pixel, 640x480 


1 pixel gouraud, Z, unlit TBD tris/sec 
5 pixel gouraud, Z, unlit TBD tris/sec 
50 pixel gouraud, Z, unlit TBD tris/sec 


1000 pixel gouraud, Z, unlit TBD tris/sec 
50 pixel Z, blinear textured TBD tris/sec 
50 pixel Z, trilinear mip-mapped TBD tris/sec 
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RE Functional Overview 


3.1 System Level Diagrams 


In its entry configuration, an Napalm graphics solution consists of a single ASIC + RAM. When 
configured as a PCI device, Napalm is a PCI Slave device that receives commands from the CPU via direct 
writes or through memory backed fifo writes. 


Napalm includes an entire VGA core, 2D graphic pipeline, 3D graphics engine, texture raster engine, and 
video display processor. Napalm supports all VGA modes plus a number of VESA modes. 


AGP/PCI System 


Frame 
Buffer 4/8/16/32/64 Mbytes of SGRAM 


Memory or 16/32/64 Mbytes of SDRAM 


Avenger+ 


TV/LCD Monitor 
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3.2 Architectural Overview 


3.2.1 Overall Overview 
The diagram below illustrates the overall architecture of the Napalm graphics subsystem. 


PCI/AGP Interface 


¥ 


CMD Fifos 


Feature VIDEO VGA 2D FBI TMU 
Connector IN PLL 


Memory Controller 
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3.2.2 Detailed Datapath Diagram 


H3 Data Path 


—“4 PCI 


Output Flops 
FBI Input 
Bus 
PCI PCI 
Input Flops 
PCI/AGP. —————— Register 
Core Bank 


AGP 
Write Buffer 


PCI tera 
FIFO Async erator 


a) AGP 
Read Buffer 


Subsystem : 
Dispatch W Reciprocal 
AGP 
Req Buffer 
Video pe I- SIW, TW 
Core 
Memory FBI 
rr ao 
FIFO Ctrl SGRAM In 
Video Sgram VGA Sgram Lop 
In In 
Memory FBI 


FIFO Buffer SGRAM Out 


VGA Sgram 
Out ) FIFO 
Memory FIF' 


Buffer Unpac 


Address Ge 
Triangle seu 
| Texture caer —frmu Sgram i 
Subsystem 
Dispatch 


__ lL Data 
Alignment 


2D Float-to-Fixed 

Input Bus Conversion 
YIQ-to-RGBI 

Color Command 

Expansion Dispatch 
RGB Bilinea 
Blend 
FBI 


Register FIF 
FIFO | | FIFO | 
Graphics Trex-to-FBl 
Core 


a Texture 
Combine 
DST Pixel 
Chroma FIFO 
2D SGRAM 2D SGRAM Graphics FBI 
In Out Backend Core SGRAM IN 
FBI 
SGRAM Out 
Video Sgra Video Strea Video Séale Color Space| Hardware DAC 
In Fifo's Segment Conversion Cursor 
3.2.3. FBI/TMU 
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FB if Linear 


Frame 
Buffer Iterator 
Access ARGB 


Texture 
Memory 


Buffer 


cE 
i : ~--B> RGB Mask, 
ee P| Apply Visibility 
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3.240 2D 


LFB- LFB’ 
Host Port 
8x8x24 
—_——$_ i} . 
al Palette Endian 
Endian CO > 
LFB LFB 
SRC - t 
aes Chroma ROP 
a4 
DST Endian 
Chroma CLIP 
DST ADR Write 
FIFO FIFO Buffer 
SRC DST 
L_| SRC ADR ADR 
»| FIFO 
To Memory Ctrl 


3.3 Functional Overview 


Note: This section is horribly out-of-date and inaccurate. It was what was left over from the original 
Avenger spec...Please ignore this section... 


Bus Support: Napalm implements both the PCI bus specification 2.1 and AGP specification 1.0 protocols, 
. Napalm is a slave only device on PCI, and a master device 
on AGP. Napalm supports zero-wait-state transactions and burst transfers. 


PCI Bus Write Posting: Napalm uses an synchronous FIFO 32 entries deep which allows sufficient write 
posting capabilities for high performance. The FIFO is asynchronous to the graphics engine, thus allowing 
the memory interface to operate at maximum frequency regardless of the frequency of the PCI bus. Zero- 
wait-state writes are supported for maximum bus bandwidth. 


VGA: Napalm includes a 100% IBM PS/2 model 70 compatible 128-bit VGA core, which is highly 
optimized for 128 bit memory transfers. The VGA core supports PC ’97 requirements for multiple adapter, 
and vga disable. 
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Memory FIFO: Napalm can optionally use off-screen frame buffer memory or AGP memory to increase the 
effective depth of the PCI bus FIFO. The depth of this memory FIFO is programmable, and when used as 
an addition to the regular 32 entry host FIFO, allows up to 1Mbyte host writes to be queued without 
stalling the PCI interface. Napalm supports 2 independent command streams that are asynchronous to each 
other. Either command stream can be located in AGP memory or frame buffer memory. 


Memory Architecture: The frame buffer controller of Napalm has a 128-bit wide datapath to RGB, 
alpha/depth-buffer, 2D desktop, video, and texture memory with support for up to 200 MHz SGRAMs or 
SDRAMS. For 2D fills using the standard 2D bitBLT engine, 8 16-bit pixels are written per clock, 
resulting in a 800 Mpixel/sec peak fill rate. For screen clears using the color expansion capabilities 
specific to SGRAM, 64 bytes are written per clock, resulting in a Gbytes/sec peak fill rate. For 
Gouraud-shaded or textured-mapped polygons with depth buffering enabled, one pixel is written per clock 
— this results in a 166 Mpixels/sec peak fill rate. The minimum amount of memory supported by Napalm is 
4 Mbytes, with a maximum of 64 Mbytes supported. 


Storing texture bitmaps, the texture memory controller of Napalm must share the 128-bit wide Datapath to 
Napalm memory. The texture unit uses sophisticated caching to reduce the required bandwidth of memory 
to perform bilinear texture filtering with no performance penalty. The amount of texture memory is only 
limited by the maximum amount of Napalm frame buffer memory. 


Host Bus Addressing Schemes: Napalm occupies a combined 256 Mbytes of memory mapped address 
space, using two PCI memory base address pointers. Napalm also occupies 256 bytes of I/O mapped 
address space for video and initialization registers. The register space of Napalm occupies 6 Mbytes of 
address space, the linear frame buffer occupies 128 Mbytes of address space, the ordered texture download 
port occupies 2 Mbytes of address space, and the 3D pipeline linear frame buffer takes 8 Mbytes of address 
space. 


2D Architecture: Napalm implements a full featured 128-bit 2D windows accelerator capable of displaying 
8, 16, 24, and 32 bits-per-pixel screen formats. Napalm supports 1, 8, 16, 24, and 32 bits-per-pixel RGB 
source pixel maps for BitBlts. 4:2:2 and 4:1:1 YUV colorspace are supported as source bitmaps for host to 
screen BitBlts. Napalm supports screen-to-screen and host-to-screen stretch BitBlts at 100 Mpixels/Sec. 
Napalm supports source and destination colorkeying, multiple clip windows, and full support of ternary 
ROP’s. Patterned Bresenham line drawing with full rop support, along with polygon fills are supported in 
Napalm’s 2D core. Fast solid fills, pattern fills, and transparent monochrome bitmap BitBlts in 8 bits-per- 
pixel, 16 bits-per-pixel, and 32 bits-per-pixel modes. 


Linear Frame Buffer and Texture Access: Napalm supports linear frame buffer, texture download access, 
and 3D pipeline frame buffer access for software ease and regular porting. Multiple color formats are 
supported for linear frame buffer write. Any pixel may be written to the 3D pixel pipeline for fogging, 
lighting, alpha blending, dithering, etc. Texture maps can be downloaded into common Napalm memory 
either through standard linear frame buffer space, 3D pixel pipeline frame buffer access, or down through 
the ordered texture memory access address space. 


Triangle-based Rendering: Napalm supports an triangle drawing primitive and supports full floating point 
hardware triangle setup. Triangle primitives may be passed from the CPU to Napalm as independent 
triangles, as part of a triangle strip, or as part of a triangle fan. Only the parameter vertex information is 
required by the host CPU, as Napalm automatically calculates the parameter slope and gradient information 
required for proper triangle iteration. 


Additional drawing primitives such as spans and lines are rendered as special case triangles. Complex 
primitives such as quadrilaterals must be decomposed into triangles before they can be rendered by 
Napalm. 


Gouraud-shaded Rendering: Napalm supports Gouraud shading by providing RGBA iterators with 
rounding and clamping. The host provides starting RGBA and ARGBA information, and Napalm 
automatically iterates RGBA values across the defined span or trapezoid. 
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Texture-mapped Rendering: Napalm supports full-speed texture mapping for triangles. The host provides 
starting texture S/W, T/W, 1/W information, and Napalm automatically calculates their slopes A(S/W), A(T/ 
W), and A(1/W) required for triangle iteration. Napalm automatically performs proper iteration and 
perspective correction necessary for true-perspective texture mapping. During each iteration of triangle 
walking, a division is performed by 1/W to correct for perspective distortion. Texture image dimensions 
must be powers of 2 and less than or equal to 256. Rectilinear and square texture bitmpas are supported. 


Texture-mapped Rendering with Lighting: Texture-mapped rendering can be combined with Gouraud 
shading to introduce lighting effects during the texture mapping process. The host provides the starting 
Gouraud shading RGBA as well as the starting texture S/W, T/W, 1/W, and Napalm automatically 
calculates their slopes ARGBA, A(S/W), A(T/W) required for triangle iteration. Napalm automatically 
performs the proper iteration and calculations required to implement the lighting models and texture 
lookups. A texel is either modulated (multiplied by), added, or blended to the Gouraud shaded color. The 
selection of color modulation or addition is programmable. 


Texture Mapping Anti-aliasing: Napalm allows for anti-aliasing of texture-mapped rendering with support 
for texture filtering and mipmapping. Napalm supports point-smapled, bilinear, and trilinear texture filters. 
While point-sampled and bilinear are single pass operations, Napalm supports trilinear texture filtering as a 
two-pass operation. 


In addition to supporting texture filtering, Napalm also supports texture mipmapping. Napalm 
automatically determines the mipmap level based on the mipmap equation, and selects the proper texture 
image to be accessed. Additionally, the calculated mipmap LOD may be biased and/or clamped to allow 
software control over the sharpness or “fuzziness” of the rendered image. When performing point-sampled 
or bilinear filtered texture mapping, dithering of the mipmap levels can also optionally be used to remove 
mipmap “banding” during rendering. Using dithered mipmapping with bilinear filtering results in images 
almost indistingusihable from full trilinear filtered images. 


Texture Map Formats: Napalm supports a variety of 4-bit, 8-bit, 16-bit, and 32-bit texture formats as listed 


below: 


Napalm includes an internal 256-entry texture palette, which can be downloaded directly from the host 
CPU or via a command to load the palette directly from texture memory. Either during downloads or 
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rendering, software programs a palette offset register to control which portion of the texture palette is to be 
used. 


In one scheme, texture data compression is accomplished using a “narrow channel” YAB compression 
scheme. 8-bit YAB format is supported. The compression is based on an algorithm which compresses 24- 
bit RGB to a 8-bit YAB format with little loss in precision. The compression scheme is called “YAB” 
because it effectively creates a unique color space for each individual texture map - examples of potential 
color spaces utilized include YIQ, YUV, etc. This YAB compression algorithm is especially suited to 
texture mapping, as textures typically contain very similar color components. The algorithm is performed 
by the host CPU, and YAB compressed textures are passed to Napalm. 


Depth-Buffered Rendering: Napalm supports hardware-accelerated depth-buffered rendering with minimal 
performance penalty when enabled. The standard 8 depth comparison operations are supported. To 
eliminate many of the Z-aliasing problems typically found on 16-bit Zbuffer graphics solutions, Napalm 
allows the (1/W) parameter to be used as the depth component for hardware-accelerated depth-buffered 
rendering. When the (1/W) parameter is used for depth-buffering, a 16-bit floating point format is 
supported. A 16-bit floating point(1/W)-buffer provides much greater precision and dynamic range than a 
standard 16-bit Z-buffer, and reduces many of the Z-aliasing problems found on 16-bit Z-buffer systems. 


To handle co-planar polygons, Napalm also supports depth biasing. To guarantee that polygons which are 
co-planar are rendered correctly, individual triangles may be biased with a constant depth value - this 
effectively accomplishes the same function as stenciling used in more expensive graphics solutions but 
without the additional memory costs. 


Pixel Blending Operation: Napalm supports alpha blending functions which allow incoming source pixels 
to be blended with current destination pixels. An alpha channel (ie. Destination alpha) stored in offscreen 
memory is only supported when depth-buffering is disabled. The alpha blending function is as follows: 


Dnew < (S e a) +/- (Dold  f) 


where 


Dnew _ The new destination pixel being written into the frame buffer 
Ss The new source pixel being generated 

Dold The old (current) destination pixel about to be modified 

or The source pixel alpha function. 

B The destination pixel alpha function. 


Fog: Napalm supports a 64-entry lookup table to support atmospheric effects such as fog and haze. When 
enabled, a 6-bit floating point representation of (1/W) is used to index into the 64-entry lookup table. The 
output of the lookup table is an “alpha” value which represents the level of blending to be performed 
between the static fog/haze color and the incoming pixel color. Low order bits of the floating point (1/W) 
are used to blend between multiple entries of the lookup table to reduce fog “banding.” The fog lookup 
table is loaded by the host CPU, so various fog equations, colors, and effects are supported. 
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Chroma-Key and Chroma-Range Operation: Napalm supports a chroma-key operation used for transparent 
object effects. When enabled, an outgoing pixel is compared with the chroma-key register. If a match is 
detected, the outgoing pixel is invalidated in the pixel pipeline, and the frame buffer is not updated. In 
addition, a superset of chroma-keying, known as chroma-ranging, may be used. Instead of matching 
outgoing pixels against a single chroma-key color, chroma-ranging uses a range of colors for the 
comparison. If the outgoing pixel is within the range specified by the chroma-range registers and chroma- 
ranging is enabled, then the frame buffer is updated with the pixel. 


Color Dithering Operations: All operations internal to Napalm operate in native 32-bit ARGB pixel mode. 
However, color dithering from the 24-bit RGB pixels to 16-bit RGB (5-6-5) pixels is provided on the back 
end of the pixel pipeline. Using the color dithering option, the host can pass 24-bit RGB pixels to Napalm, 
which converts the incoming 24-bit RGB pixels to 16-bit RGB (5-6-5) pixels which are then stored in the 
16-bit RGB buffer. The 16-bit color dithering allows for the generation of photorealistic images without 
the additional cost of a true color frame buffer storage area. 


Programmable Video Timing: Napalm uses a programmable video timing controller which allows for very 
flexible video timing. Any monitor type may be used with Napalm , with 76+ Hz vertical refresh rates 
supported at 800x600 resolution, and 100+ Hz vertical refresh rates supported at 640x480 resolution. 
Lower resolutions down to 320x200 are also supported. 


Video Output Gamma Correction: Napalm uses a programmable color lookup table to allow for 
programmable gamma correction. The 16-bit dithered color data from the frame buffer is used an an index 
into the gamma-correction color table -- the 24-bit output of the gamma-correction color table is then fed to 
the monitor 


Video Overlay: Napalm supports one full featured video overlay that is unlimited in size, and supports 
pixel formats of YUV 411, YUV 422, RGB (1-5-5-5), RGB (5-6-5), and RGB (x-8-8-8). The video 
overlay can be double, tripple or quad buffered, and can be bilinear scaled to full screen resolutions. 


Video In: VMI video in port with complete host port is fully supported in Napalm. Video in is double 
buffered and can be optionally deinterlaced by replicating lines in a single frame or by merging 2 frames 
together. 


PLL/DAC: Napalm contains 3 independent PLL’s for clock generation. The PLL’s are totally 
programmable giving the capability to change video, graphics, and memory clocks to any specified 
frequency. Napalm supports a high speed 300 Mhz RAMDAC, capable of doing 1600x1280 @ 76Hz 
refresh. 


3.4 Modifications from SST1 


Colbufsetup 
Auxbufsetup 

Chroma Range 

intrCtrl, userIntrCMD 
fbiTriangles register 

Full triangle setup registers 
Fogmode 
Fogtable 


fbzColorPath 

fbzMode 
Copyright © 1996-1999 3dfx Interactive, Inc. Revision 1.13 
3dfx Confidential 22 Printed 
10/24/2019 


For Internal Use Only 


3 " Napalm Graphics Engine 
increase of rendering window to -4k to 4k 
dditional clip rectangle 

Byte access Ifb 

ew command fifo interface 

Texture mirroring 

Addition of VGA core 

Addition of Video surfaces 

Additional 6666 palettized texture format 
Full featured 2D accelerator engine. 
Separate filter controls for Alpha, and RGB. 

Combined TMU unit 

Increased blending fraction from 1.4 to 1.8. 

Separate register / LFB byte swizzling for big endian machines. 
PC ’97 compliant 


3.5 Additions to Avenger from Banshee 


© Higher core clock frequency = 143 MHz. 

Graphics core and memory interface now all run on a master graphics clock. 

300MHz RAMDAC. 

0.25 micron, SLM technology 

452pin BGA 

AGP 2x support 

| i on-chip command fifo RAM to increase AGP command fifo performance. 
Programmable watermarks for lfb/cmdfifo write fifo (pciInit0); can increase efficiency of 
command transport. 

° 2 TMUs for to enable single-cycle special effects such as trilinear filtering, dual-texturing and 
bump-mapping. 

© ¢split functionality added back in to the TMUs. 

¢ Video fetch performance modification (controlled with CYA in vidProcCfg); boost video 

erformance by making video fifo thresholds more effective. 

+ ines performance for minified textures (texture fetch engine was modified). 

djustable delay for TV-out clock. 

Support for simultaneous VMI and TV-out. 

Additional internal status observability registers:cmdStatus0, cmdStatus|1. 

Removal of separate mclk domain (mclk domain is now gclk domain). 

Two device ID’s supported: 5=high-speed Napalm, 4=lower speed sort; different PLL 

programming is required depending on device ID: see section on PLL programming. 
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Programming Notes on Avenger vs. Banshee 


Video register changes per TV-out interface: addtion of (VidInStatusCurrentLine, 
vidTvOutBlankHCount, vidTvOutBlankV Count, vidInFormat, 
vidSerialParallelPortRegister, vidInYDecimDeltas) 

Additional flushing code required around texture downloads (Maintaining Cache Coherency, 
section 18.3) 

Additional texture download aperture: see Napalm Address Space and Command Packet 5 
sections. 

Software should try to tune video fifo watermarks to boost performance, given the enhanced video 
fetch logic. 

Programming of PLL depends on device ID: id==5 -> m, n, k are all fully programmable; id==4 - 
> m is fixed to 0x24; see section 9. 

Problem with VGA-space P6-style write combining is fixed. 

Board Note: Because of the presence of an AGP pll, it is strongly recommended that the chip not 
be run in AGP pll bypass mode. 

SDRAM fastfillCMD command must still be done by using just the color-plane fill. 

Swapbuffer pending count logic is fixed, and will increment/decrement as described in the 
documentation. 
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4. Napalm Address Space 


MemoryBase0 (128 MBytes) 


| Memory Address —C—“‘“‘(RR 
I/O register remap (See I/O section below 
Pe 


L 

i 

MBytes 
a 


Memory Basel (128 MBytes) 


Memory Address 
0x0000000 - 0x7FFFFFF a) 


I/O Base 


[WO Address | od 
| | Mitialization registers 
| | PLL and Dacregisters 
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| sd Video Registerspartd 
vidChromaMin register. 

vidOverlayDudx register 

vidOverlayDvdy register. 

| CL VGARegisters— 
| si Video RegisterspartM 
Oxe8 - Oxeb 
Oxec - Oxef 
Oxf0 - 0xf3 vidInAddr1 register. 

Oxf4 - Oxf7 
Oxf8 - Oxfb 


Oxfe - Oxff vidCurrOverlayStartAddr register. 


VGA Address Space 
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5. VGA Register Set 


5.1 Overview of the Napalm VGA Controller 


The Napalm VGA core supports all standard VGA modes with full backward compatibility. This allows 
the 3D controller to be able to share the frame buffer with the 2D controller, thereby saving total solution 
cost. 


In addition to the legacy VGA, Napalm also supports Vesa BIOS extensions. This is accomplished by 
extending the standard register set and implementing a flexible memory aperture such that VBE 
applications can page select memory through the standard VGA address space. 


5.2 Using VGA Registers When Napalm is not the Primary VGA 


For systems not requiring VGA or a VGA device already exists, Napalm allows the use of the VGA 
registers in an extended fashion. In this mode, VGA registers are not decoded in legacy VGA space, but in 
relocatable IO and memory space. 


Napalm should be powered on with the device type set to ‘Multimedia Device’ through the strapping 
registers. Napalm will not respond to any legacy I/O or memory space. In order to use the VGA registers, 
Napalm should be set up to be a motherboard device (VGAINITO0 bit 8), and the IO base + Oxc3 bit 0 
should be set to 1. 


In this configuration, all of the VGA registers (except 0x46e8 and 0x0102) are available by truncating the 
leading ‘0x03’ from the legacy address, and adding that address to the I/O base address. 


Note that in this configuration, however, memory is not accessible through the VGA aperture. 


5.3 Locking VGA Timing for Virtualized Modes 


When running VGA applications in a window, it is possible to restrict changes to the VGA timing registers 
set. This is accomplished by setting the lock bits in vgaInit1. The locks prevent applications from 
changing the values in the associated registers. 


5.4 Setting VGA Timing for Video 2 Pixels per Clock Mode 


For extended resolutions that run at frequencies greater than 135Mhz, it is required that the Video Unit be 
placed in a 2 pixel per clock mode. This implies that the video clock is divided by 2 (see dacMode). Since 
the clock is running at half the frequency, all horizontal timing registers should also be divided in half. 


Note: All horizontal video timing must be divisible by 16 pixels. 
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This section outlines the compatible VGA register set followed by a brief description of their operation. 


CRTC Index Register 
Horizontal Total 

Horizontal Display Enable End 
Start Horizontal Blanking 
End Horizontal Blanking 
Start Horizontal Retrace 

End Horizontal Retrace 
Vertical Total 

Overflow 

Preset Row Scan 

Maximum Scan Line 

Cursor Start 

Cursor End 

Start Address High 

Start Address Low 

Cursor Location High 

Cursor Location Low 

Vertical Retrace Start 
Vertical Retrace End 

Vertical Display Enable End 
Offset 

Underline Location 

Start Vertical Blank 

End Vertical Blank 

CRTC Mode Control 

Line Compare 

Horizontal Extension Register 
Vertical Extension Register 
Extension Byte 0/ PCI Configuration 
Extension Byte | 


Extension Byte 2 
Extension Byte 3 
Latch Read Back 
Attribute Controller Index/Data State 


CRTC Register Set 


Read Port Register Name 


Miscellaneous Output 
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Input Status Register 0 
Input Status Register | 


Motherboard Enable 

Adapter Enable 

Subsystem Enable 
General Register Set 


| | CRegister Name 


Sequencer Index Register 
Reset 

Clocking Mode 

Map Mask 

Character Map Select 
Memory Mode 


Sequencer Register Set 


| | (Register Name 


Graphics Controller Index Register 
Set/Reset 

Enable Set/Reset 

Color Compare 

Data Rotate 

Read Map Select 

Graphics Mode 

Miscellaneous 

Color Don't Care 

Bit Mask 


Graphics Controller Register Set 


| | (Register Name 


Palette Registers 

Attribute Mode Control Register 
Over Scan Control Register 
Color Plane Enable Register 
Horizontal PEL Panning Register 
Color Select Register 


Attribute Controller Register Set 


| *RegisterName 
Pixel Mask 
Read Index 
Read Status 
Write Index 


Copyright © 1996-1999 3dfx Interactive, Inc. 
3dfx Confidential 29 
10/24/2019 


For Internal Use Only 


Revision 1.13 
Printed 


3 \ Napalm Graphics Engine 


RAMDAC Register Set 
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5.5 General Registers: 


5.5.1 Input Status 0 (0x3C2) 
Description 


Interrupt Status. When its value is “1”, denotes that an interrupt is pending. 


Feature Connector. These 2 bits are readable bits from the feature connector. 
Sense. This bit reflects the state of the DAC monitor sense logic. 


Reserved. Read back as 0. 


Data written to port 0x3C2 is stored in the Miscellaneous Output Register (Ox3CC). 


5.5.2 Input Status 1 (0x3BA/0x3DA) 
7:6 
5:4 Display Status. These 2 bits reflect 2 of the 8 pixel data outputs from the Attribute 
controller, as determined by the Attribute controller index 0x12 bits 4 and 5. 
3 
2:1 


R Display Disable. When this bit is 1, either horizontal or vertical display end has occurred, 
otherwise video data is being displayed. 


5.5.3. Feature Control Write (0x3BA/0x3DA) 


| | Deseription 
| | Reserved 
| | Vertical Syne Select 
| | Reserved 
| | Reature Control 


5.5.4 Feature Control Read (0x3CA) 


| | Deseription 
| | Reserved 
Ss 
| 
| 


Video Status. Reads back two bits of the VGA video stream. See 0x3c0, index 0x12, bits 
5:4. 


Feature Control 


| 
| 
|_| Vertical Syne Select 
| 
| 
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5.5.5 Miscellaneous Output (0x3CC) 


PT Description 
a eee 


Vertical Sync Polarity (0 = positive, 1= neg 


PT Horizontal Sync Polarity (0 = positive, 1= negative). 
| |__| Page Select. When in Odd/Even mode Select High 64k bank if set. 


|__| Reserved 
PT Clock Select 


Ram Enable (1= Enable) 
CRTC I/O Address. (1 = Color. Base Address=0x3D?; 0 = Mono. Base Address=0x3B?). 


Data is written to this register via port 0x3C2. Bits 6-7 also indicate the number of lines on the display, 
while bit 3-2 select the video clock frequency. 


[0 | Reserved (0 | 25.175Mhz 


Enable (0x3C3) 
| || Description 


P| Reserved 
| || Video Subsystemenable 


5.5.7 Adapter Enable (0x46E8) 


| | | Deseription 
| | Reserved 
| | Setup Mode 
| | | Video SubsystemEnable 
| | | ROM Bank Address- Unused, 


5.5.8 Subsystem Enable (0x102) 


|0 | W__| Global Subsystemenable 


Copyright © 1996-1999 3dfx Interactive, Inc. Revision 1.13 
3dfx Confidential 32 Printed 
10/24/2019 


For Internal Use Only 


5.6 CRTC Registers: 


The CRTC registers are responsible for the video timing on Napalm. By default, Napalm is a 100% 
compatible VGA. However, Napalm can also be set up to drive much larger resolutions than that allowed 
by the VGA standard. 


A _§_§£q£ @ — Horizontal Total ————————————_—————__——_—_. 
AS dorzontal Blanking End ———————_>__ 
A Hrzonial Retrace End ——————__>.__: 
A$ Horzontal Retrace Start ——————__——_—__—_—_—_>.__; 
= Horzontal Blanking Start ———————_—_——_—_——_»>_: 
$$ orzontal Display End ———————————_»__: 


Active Display Area 


g End ———________» 
<—_ ——— Vertical Retrace End ———————————_> 


Horizontal Border 
Horizontal Blank 
Horizontal Blank 
Horizontal Border 


¢—_—— Vertical Display End —-———————_: 


Vertical Border 
Vertical Blank 


<—____————_ Vertical Retrace Start ———————————_»> 
 M—  ————— Vertical Blanking Start —————————_> 


Vertical Blank 
Vertical Border 


Yt) Vertical Blankin 


tH Vertical Total ——————_____.. 
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The following chart indicates the bit locations for the timing registers. 


ae (5 Ee ee ee ee 
a OR ES SD 


Total 


Blank Start 15{7] 1513) | 1522) 15{0] 
Blank End 16[2] 16[0] 


Syne Start 
my [amy | 110) 


5.6.1 CRTC Index Register (0x3B4/0x3D4) 


This register provides index information for any subsequent accesses to 0x3B5/0x3D5. 


5.6.2 Index 0x0-Horizontal Total (0x3B5/0x3D5) 


This register defines the total width of the display in character clocks, including retrace time, minus 5. Bit 
8 of this register is found in the Horizontal Extension Register (index 0x1A) bit 0. 


Total Horizontal Character Count less 5. 


The 5 character clocks are reserved to provide adequate prefetch time for the beginning data on the first 
line. 


5.6.3. Index 0x1-Horizontal Display Enable End (0x3B5/0x3D5) 


This register defines the total number of visible horizontal characters on the display, minus one. Bit 8 of 
this register is found in the Horizontal Extension Register (index 0x1A) bit 2. 


Display Active Characters -1. 
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5.6.4 Index 0x2-Start Horizontal Blanking (0x3B5/0x3D5) 


Horizontal blanking begins when the horizontal character counter reaches this character clock value. Bit 8 
of this register is found in the Horizontal Extension Register (index 0x1A) bit 4. 


Description 


7:0 Start Horizontal Blanking 


5.6.5 Index 0x3-End Horizontal Blanking (0x3B5/0x3D5) 


Description 

Compatibility Read. When this bit is set to ‘1’ Vertical Syne Start and Vertical Sync End are 
both readable and writeable. When set to ‘0’ these registers are still writeable, but not 
readable. 

Display Enable Signal Skew Control. These bits define the display enable signal skew time 


in relation to horizontal synchronization pulses. 

End Horizontal Blanking. End Horizontal Blank signal width is determined as the value of 
start blanking register plus W in character clocks. The least significant five bits are 
programmed in this register, while the most significant bit is the End Horizontal Retrace 
Register (Index 0x05) bit 7. 


5.6.6 Index 0x4-Start Horizontal Sync (0x3B5/0x3D5) 


This register contains the character count at which horizontal sync output pulse becomes active. Bit 8 of 
this register is found in the Horizontal Extension Register (index 0x1A) bit 6. 


Start Horizontal Syne Character Count. 


5.6.7. Index 0x5-End Horizontal Sync (0x3B5/0x3D5) 


Ww 
Horizontal Blank Overflow Bit 5. MSB (bit 5) of End Horizontal Blanking Register 


Horizontal Sync Skew. These bits define the number of character clocks the horizontal Sync 
signal is skewed. 


End Horizontal Sync Pulse Width “W”’. Start retrace register value is added to the character 
count for width W. The least significant five bits are programmed in this register. When the 
Start Horizontal Retrace Register value matches these five bits, the horizontal retrace signal 
is turned off. 
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5.6.8 Index 0x6-Vertical Total (0x3B5/0x3D5) 


The least significant eight bits of a ten bit count of raster scan lines for a display frame less 2. Time for 
vertical retrace, and vertical sync are also included. The ninth and tenth bits of this count are loaded into 
the Vertical Overflow Register (Index 0x7) bit 0 and bit 5 respectively. Bit 8 of this register is found in the 
Horizontal Extension Register (index 0x1B) bit 0. 


Description 


7:0 Raster Scan Line Total Less 2. 


5.6.9 Index 0x7-Overflow (0x3B5/0x3D5) 
This register contains ‘Overflow’ bits from other CRTC registers. 


Base Index 

| R/W | Vertical Sync Start Bit9. I 

Ox12 

0x6 

| R/W | Line Compare Bit 8. 0x18 

Start Vertical Blank Bit 8. 0x15 

| R/W | Vertical Retrace Start Bit 8. 0x10 
Vertical Display Enable End Bit 8. 0x12 


|0 | R/W | Vertical Total Bit 8. 


|R_ | Reserved 


R/W | Byte Panning Control. These bits allow up to 3 bytes to be panned in modes programmed as 
multiple shift modes. 


Preset Row Scan Count. These bits preset the vertical row scan counter once after each 
vertical retrace. This counter is incremented after each horizontal retrace period, until the 
maximum row scan count is reached. When maximum row scan count is reached, the 
counter is cleared. This register can be used for smooth vertical scrolling of text. 


16 Pixels 
24 Pixels 
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5.6.11 Index 0x9-Maximum Scan Line (0x3B5/0x3D5) 


Description 


Line Doubling. 0= Normal Operation. 1 = Activate line doubling. 
16 [| R/W Line Compare. Bit 9 of the Line Compare Register (index = 0x18). 


Start Vertical Blank. Bit 9 of the Start Vertical Blank Register (index = 0x15). 
Maximum Scan Line. Maximum number of scanned lines for each row of characters. The 
value programmed is the maximum number of scanned rows per character minus 1. 


5.6.12 Index 0xA-Cursor Start (0x3B5/0x3D5) 
Description 
Reserved. Defaults to 0. 
Cursor Control. 0=Cursor on, 1= Cursor off. 
4:0 R/W | Cursor Start Scan Line These bits specify the row scan counter value within the character 
box where the cursor begins. These bits contain the value of the character row less 1. If this 
eye value is programmed with a value greater than the Cursor End Register (index = 0xB), no 
cursor is generated. 


5.6.13 Index 0xB-Cursor End (0x3B5/0x3D5) 


i 


Reserved. Defaults to 0. 


: Cursor Skew Bits. Delays the displayed cursor to the right by the skew value in character 
clocks e.g., | character clock skew moves the cursor right by | position on the screen. 


4:0 Cursor End Scan Line. These bits specify the last row scan counter value within the 
character box during which the cursor is active. If this value is less than the cursor start 
value, no cursor is displayed 


5.6.14 Index 0xC-Start Address High (0x3B5/0x3D5) 


Eight high order bits of the 16 bit video memory address, used for screen refresh. The low order eight bit 
register is at index OxD. 


Display Screen Start Address Upper Byte Bits. 


5.6.15 Index 0xD-Start Address Low (0x3B5/0x3D5) 
The lower order eight bits of the 16 bit video memory address. 


Start Address Low Byte. 


5.6.16 Index 0xE-Cursor Location High (0x3B5/0x3D5) 


The eight higher order bits of 16 bit cursor location in VGA modes. For the lower order eight bits, see the 
Cursor Location Low Register at index OxF. 
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Description 


Cursor Address Upper Byte Bits. 


5.6.17 Index 0xF-Cursor Location Low (0x3B5/0x3D5) 


Description 


Cursor Address Lower Byte Bits. The lower order eight bits of the 16 bit video memory 
address. 


5.6.18 Index 0x10-Vertical Retrace Start (0x3B5/0x3D5) 


The lower eight bits of the ten bit Vertical Retrace Start Register. Bits 8 and 9 are located in the 
Overflow Register (index = 0x7). Bit 10 is in the Vertical Extension Register (index 0x1B) bit 6. 


Vertical Sync Start Pulse Lower Eight Bits. 


5.6.19 Index 0x11-Vertical Retrace End (0x3B5/0x3D5) 
7 R/W | CRTC Registers Write Protect. When this bit is 0, writes to CRT index registers 0x0 to 0x7 
are enabled. When this bit is 1, writes to CRT Controller index registers in the range of 


index 0x0 to 0x7 are protected except line compare bit 4 in the Overflow Register 0x7. 


R/W | DRAM Refresh/Horizontal Scan Line. Historically, this register selected DRAM refresh 
cycles per horizontal scan line. This function is not implemented. 


4 R/W | Clear Vertical Retrace Interrupt. (0=Clear Vertical retrace interrupt, 1= Allow an interrupt to 
be generated after the last displayed scan of the frame has occurred (i.e., the start of the 
bottom border). 

3:0 R/W | Vertical Retrace End. This register specifies the scan count at which vertical sync becomes 
inactive. For retrace signal pulse width W, add scan counter for W to the value of the 
Vertical Retrace Start Register. The 4 bit result is written in the Vertical Retrace End 
Register. 


5.6.20 Index 0x12-Vertical Display Enable End (0x3B5/0x3D5) 


This register specifies the eight lower bits of ten bit register that defines where the active display frame 
ends. The programmed count is in scan lines minus 1. Bit 8 and 9 are in the Overflow Register (index 
0x7) at bit positions | and 6 respectively. Bit 10 is in the Vertical Extension Register (index 0x1b) bit 2. 


7:0 Vertical Display Enable End Lower Eight Bits. 
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5.6.21 Index 0x13-Offset (0x3B5/0x3D5) 


This register specifies the width of display memory in terms of an offset from the current row start address 
to the next character row. The offset value is a word address adjusted for word or double word display 
memory access. 


Description 


Logical Line Screen Width. 


5.6.22 Index 0x14-Underline Location (0x3B5/0x3D5) 


Description 
Reserved. 
Double Word Mode. (0 = Display memory addressed for byte or word access. 1= Display 
memory addressed for double word access). 
Count By 4 For Double Word Access. (O= Memory address counter clocked for byte or 
word access, 1 = Memory address counter is clocked at the character clock divided by 4.) 
Underline Location. These bits specify the row scan counter value within a character matrix 
where under line is to be displayed. Load a value 1 less than the desired scan line number. 


5.6.23 Index 0x15-Start Vertical Blank (0x3B5/0x3D5) 


The lower eight bits of the ten bit Start Vertical Blank Register. Bit 8 is in the Overflow Register (index = 
0x7) and bit 9 is in the Maximum Scan Line Register (index = 0x9). The ten bit value is reduced by 1 
from the desired scan line count where the vertical blanking signal starts. 


Description 


Start Vertical Blank Lower Eight Bits. 


5.6.24 Index 0x16-End Vertical Blank (0x3B5/0x3D5) 


Vertical Blank Inactive Count. 


End Vertical Blank is an 8 bit value calculated as follows: 
End Vertical Blank = (Start Vertical Blank - 1) + (Vertical Blank signal width in scan lines). 
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5.6.25 Index 0x17-CRTC Mode Control (0x3B5/0x3D5) 


otherwise in byte address mode bit 0 appears on MAO. Setting this bit to 1 selects MA15 for 
odd/even mode. 


Count by 2 (0 = Character clock increments memory address counter, 1= Character clock 


divided by 2 increments the address counter). 
R/W | Horizontal Retrace Clock Rate Select For Vertical Timing Counter. 0= Normal, 1= Selects 
horizontal retrace clock rate divided by 2.) 
oh Select Row Scan Counter.0=Selects row scan counter bit | as output at MA14 address pin.1 
Selects bit 14 of the CRTC address counter as output at MA14 pin. 
6845 CRT Controller compatibility mode support for CGA operation. 0 = Row scan address 
eye bit 0 is substituted for memory address bit 13 at MA13 output pin during active display time. 
1=Enable memory address pin 13 to be output at MA13 address pin. 


5.6.26 Index 0x18-Line Compare (0x3B5/0x3D5) 

7:0 R/W_ | Line Compare Lower Eight Bits. Lower eight bits of the ten bit Scan Line Compare 
Register. Bit 8 is in the Overflow Register (index = 0x7) and bit 9 is in the Maximum 
Scan Line Register (index = 0x9). When the vertical counter reaches this value, the 
internal start of the line counter is cleared. 


5.6.27 Index 0x1A-Horizontal Extension Register (0x3B5/0x3D5) 


This register is an extension of the VGA core in order to increase the total horizontal resolution available to 
Napalm. This register is only active when VGAINITO0 bit 6 is ‘1’. 


[7 __| RW | Horizontal Retrace End bitS. 
[6 | R/W | Horizontal Retrace Startbit8 
[5 | R/W_| Horizontal Blank Endbit6. 


[3 | RW | Reserved 
[1 | RW [Reserved 
[0 | RW | Horizontal Totalbit8. 
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5.6.28 Index 0x1B-Vertical Extension Register (0x3B5/0x3D5) 


This register is an extension of the VGA core in order to increase the total Vertical resolution available to 
Napalm. This register is only active when VGAINIT0 bit 6 is ‘1’. 


Description 
Reserved Pe = 
16 | R/W | Vertical Retrace Start bit 10 

Reserved. 

Vertical Blank Start bit 10. 

13 | R/W _| Reserved. 

12 =| RW | Vertical Display Enable End bit 10 

11 | RW Reserved. 

lo. 8) | RW Vertical Total bit 10. 


5.6.29 Index 0x1C-PCI Config/Extension Byte 0 (0x3B5/0x3D5) 


On power up, Napalm is configured to allow read back of the PCI configuration information a byte at a 
time through this register. In order to use this feature, first follow the standard wake up sequence. To 
selectively read back configuration information, write the index into this register. Data read back from this 
register is the configuration byte at that index. 


Description 


PCI Configuration/Scratch Pad Register. 


The use of the extended register space is decoded as follows: 


VGAINIT0 
7 6 Description 


Allow Configuration data to be read back from PCI (Indexed) 


1 


Extended registers Are scratch Pad 
Extended registers Disabled 


5.6.30 Index 0x1D-Extension Byte 1 (0x3B5/0x3D5) 
This register is only active when VGAINITO0 bit 6 is ‘1’ 


Description 


7:0 Scratch Pad Register. 
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5.6.31 Index 0x1E-Extension Byte 2 (0x3B5/0x3D5) 
This register is only active when VGAINITO0 bit 6 is ‘1’ 


Description 


7:0 Scratch Pad Register. 


5.6.32 Index 0x1F-Extension Byte 3 (0x3B5/0x3D5) 
This register is only active when VGAINITO0 bit 6 is ‘1’ 


Description 


Scratch Pad Register. 


5.6.33 Index 0x20-Vertical Counter pre-load Low (0x3B5/0x3D5) 


This register, in combination with index 0x20, allows the vertical counter to be pre-loaded for testing 
purposes. The vertical counter is pre-loaded on reset, which can be caused either through a hard reset or a 
soft reset. This register is only active when VGAINITO0 bit 6 is ‘1’. 


Bit Description 
7:0 Scratch Pad Register. 


5.6.34 Index 0x21- Vertical Counter pre-load High(0x3B5/0x3D5) 
This register is only active when VGAINITO0 bit 6 is ‘1’ 


Description 
Scratch Pad Register. 


5.6.35 Index 0x22-Latch Read Back (0x3B5/0x3D5) 


R/W | Latch Data Register. This register reflects the contents of one of the four Graphics Data 


Controller latches. The plane selected for read back is determined by Graphics Controller 
Read Map Select Register (index 0x4) bits 0 and 1. 


5.6.36 Index 0x24-Attribute Controller Index/Data State (0x3B5/0x3D5) 


Description 
7 R Attribute Controller Index/Data State. When this is 1, the Attribute controller register is set 
to ‘Data’ state. When set to 0, the Attribute controller register is set to ‘Index’ state. 


Reading 0x3DA will always put the Attribute Controller back to Index State. 


16:0 |R | Reserved. 
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5.6.37 Index 0x26-Display Bypass/Attribute Controller Index (0x3B5/0x3D5) 


-Bit_{ RAW Description 


Display Bypass. Reflects the value of the Attribute Controller index register, bit 5. 


aa ee Attribute index. 
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5.7 Graphics Controller Registers: 


5.7.1 Graphics Controller Index Register (Ox3CE) 


Data written to this 8 register reflects the index of the Graphics Controller register space accessed through 
Ox3CF. 


Description 


Reserved. 
Index for accesses at 0x3CF. 


5.7.2 Index 0-Set/Reset (0x3CF) 


When the CPU executes display memory write with Write Mode 0 selected and the Enable Set/Reset 
Register (index = 0x1) activated, the eight bits of the value in this register, which have been operated on 
by the Mask Register, are then written to the corresponding display memory map. It is an eight fill 
operation. The map designations are defined below: 


[O [ Reset 


5.7.3. Index 1-Enable Set/Reset (0x3CF) 


memory map access defined by the Set/Reset Register (index = 0x0), and the respective 
memory map is written with the Set/Reset Register value. 


R/W | Enable Set/Reset Register (Index 0x0). When Write Mode 0 is selected, these bits enable 


5.7.4 Index 2-Color Compare (0x3CF) 

The color compare contains the value to which all 8 bits of the corresponding memory map are compared. 
This comparison also occurs across all four maps, and a | is returned for the map positions where the bits 
of all four maps equal the Color Compare Register. If a system read is done with 3 = 0 for the Graphics 
Mode Register (index = 0x5), data is returned without comparison. Color compare map coding is shown 
below. 
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5.7.5 Index 3-Data Rotate (0x3CF) 


Description 

Reserved. 

Function Select. Function select for any of the write mode operations defined in the 
Graphics Mode Register (index = 0x5) is defined in the following table. 
Rotate Count. These bits specify number of positions of rotation to the right and is 
ineffective in write mode 2, defined by the Graphics Mode Register (index =0x5). 


5.7.6 Index 4-Read Map Select (0x3CF) 
1:0 R/W | Map Select. These bits select memory map in system read operations. It has no effect on 
color compare read mode. In odd/even modes, the value can be 0x0 or 0x1 to select chained 


maps 0 & 1 or value 0x2 or 0x3 to select the chained maps 2 & 3. 


5.7.7 Index 5-Graphics Mode (0x3CF) 


: Shift Mode. 
00 = data is shifted out normally. 
01 = data is shifted out Even/Odd 
1x = 256 Color Mode shift 
CGA compatible Odd/Even Mode. When set to ‘1’ , Sequential addressing is as defined by 
bit 2 of the Memory Mode Register (index = 0x4) in the Sequencer Register. Even system 
addresses access maps 0 or 2 and odd system addresses access maps | or 3. 
Read Mode. When set to 0, System reads data from memory maps selected by Read Map 
Select Register (index 0x4). This setting has no effect if bit 3 of the Sequencer Memory 
Mode Register = 1. When set to 1, System reads the comparison of the memory maps and 
the Color Compare Register. 


Write Mode. The table on the following page defines the four write modes. 
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1:0 ae 
|00 [0 | CPU or data from the Set/Reset Register is written to graphics memory. _| 
(or [1 | Latch data is written to graphicsmemory Cd 


Plane n is filled with data bit n 


11 3 The addressed byte in each plane is filled with the value of the 
corresponding bits in the Set/Reset Register (index 0x0). The Enable 
Set/Reset Register (index 0x1) has no effect. Rotated CPU data is 


logically ANDed with the Mask Register (index 0x8). 


5.7.8 Index 6-Miscellaneous (0x3CF) 


| R/W | Description 

eae Reserved. 

os Memory Map 1,0 Display memory map control into the CPU address space is shown in the 
following table. 
Odd/Even Mode. When set to 1, CPU address AO is replaced by higher order address bit. 

a AO is then used to select odd or even maps. AO = 0 selects map 0 or 2, while AO = 1 selects 
map | or 3. 

| 0 | R/W_| Graphics/Alphanumeric Mode. 0 = Alphanumeric mode, 1= Graphics mode. 


}0 [0 [oxAooo0 128K] None 
}0 [1 |oxAoo00 | 64K] EGA/VGA/Extended Graphics Modes___| 
[1 [0 |oxBooo0 | 32K | Monochrome Text Modes 


5.7.9 Index 7-Color Don’t Care (0x3CF) 


Memory Map Color Compare Operation. 1=Enable, 0 = Disable. 


5.7.10 Index 8-Mask (0x3CF) 

Mask operation applies simultaneously to all the four maps. In Write Modes 0 and 2, this register provides 
selective changes to any stored in the system latches during processor writes. Data must be first latched by 
reading the addressed byte. After setting the Mask Register, new data is written to the same byte in a 
subsequent operation. Mask operation is applicable to any data written by the processor. 


Mask. 0 = Mask, 1 = Disable mask. 
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5.8 Attribute Registers 


5.8.1 Attribute Index Register (0x3C0) 


The Attribute Index Register has an internal flip-flop, rather than an input bit, which controls the selection 
of the Address and Data Registers. Reading the Input Status Register 1 (port = 0x3BA/0x3DA) clears the 
flip flop and selects the Address Register, which is read through address 0x3C1 and written at address 
0x3C0. Once the Address Register has been loaded with an index, the next write operation to 0x3C0 will 
load the Data Register. The flip-flop toggles between the Address and the Data Registers after every write 
to address hex 0x3C0, but does not toggle for reads to address 0x3C1. 


Description 
Reserved. 


Palette Address Source. (0=Disable palette outputs, 1=Enable palette outputs.) 
Attribute Controller Index Register Address Bits 


5.8.2 Index 0x0 through 0xF-Palette Registers (0x3C0/3C1) 


The Palette Registers are effectively a lookup table 6 bits wide by 16 levels deep. The purpose of this 
lookup table is to allow dynamic color mapping from the original video data stream. The palette provides a 
translation from 4 bits to 6 bits of data. The palette output data is either combined with the Color Select 
Register (index 0x14), or two the result of two shifts are appended together, resulting in an 8 bit video 
stream. 


5.8.3. Index 10-Attribute Mode Control Register (0x3C0) 


VID5, VID4 Select (0=Use palette outputs, 1=use Color select Register index 0x14.) 
16 | R/W Pixel Width (0= one pixel every VCLK, 1 = one pixel every 2 VCLK) 


R/W | Pixel Panning Compatibility. (O=Enable Pixel Pan on line compare, | = disable on line 


Lie Reserved. 


babe Intensity/Blink Selection. (0= MSB of attribute is background color, 1= MSB of 
ee is blink) 

_ Line Graphics Character Code. Setting this bit to 0 forces ninth dot to be the same color as 
background in line graphics character codes. Setting this bit to 1 forces the ninth dot 
character to be identical to the eighth character dot. Set this to zero for character fonts that do 
not utilize line graphics character codes. 


2 Mono/Color Emulation. (0=Color, 1 = Mono) 


fo. | R/W Graphics/Alphanumeric Mode Enable. (O=alphanumeric, 1= 


5.8.4 Index 11-Over Scan Control Register (0x3C0) 
This register determines the over scan or border color. For monochrome displays, this register is set to 0. 
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Description 
Over Scan/Border Color 


5.8.5 Index 12-Color Plane Enable Register (0x3C0) 


Description 
[7:6 |R___| Reserved. 


5:4 R/W | Video Status Control. These bits select 2 out of 8 color outputs which can be read by the 
Input Status Register 1 (port = 0x3BA/0x3DA) bits 4 and 5. 
Color Plane Enable. Setting a bit to 0 disables the respective color plane(s). 


5.8.6 Index 13-Horizontal Pixel Panning Register (0x3C0) 

These bits select pixel shift to the left horizontally. For 9 dots/character modes, up to 8 pixels can be 
shifted horizontally to the left. Likewise, for 8 dots/character up to 7 pixels can be shifted horizontally to 
the left. For 256 color, up to 3 position pixel shifts can occur. 


Horizontal Pixel Panning. See table. 


0x0 

a (7 
Oxo of? BC 
ox? fs BYU 
ox8-oxf fo 


es ee 


5.8.7. Index 14-Color Select Register (0x3C0) 


Color Value MSB. Two most two significant bits of the eight digit color value for the video 


DAC. They are normally used in all modes except 256 color graphics. 

Substituted Color Value Bits. These bits can be substituted for VID5 an VID4 output by the 
Attribute Controller palette registers, to create eight color value. They are selected by the 
Attribute Controller Mode Control Register (index = 0x10). 
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5.9 Sequencer Registers 


5.9.1 Sequencer Index Register (0x3c4) 


Description 


7:0 Index for accesses at 0x3c5. 


5.9.2 Index 0-Reset (0x3c5) 


Description 

Reserved. 

Synchronous Reset. 0=Video Timing is cleared and halted. This is used to synchronize 
changing the either bits 3 or 2 of the Miscellaneous Output Register. |= Operational 
mode. 

Asynchronous Reset. 0=Sequencer is cleared and halted asynchronously. This bit is used 
to force the Sequencer into a reset state, regardless of the operation it is performing. 
1=Operational mode. 


Screen Off. When this bit is set to | the screen turned off, all requests for video FIFO 
refresh are disabled, allowing additional bandwidth for other memory operations. SYNC 
signals remain active. 

Video Serial Shift Register Loading. When this bit is 0, serial shift registers are loaded 
every character or every other character clock depending on bit 2 of this register; otherwise 
when this bit is 1, Serial shift registers loaded every 4" character clock (32 fetches). 

Dot Clock Selection (O= Normal dot clock selected by VCLK input frequency, 1 = Dot 
Clock divided by 2 (used for 320/360 pixel width display modes). 

Shift Load. This is only effective if bit 4 of this register = 0. (0O=Video serializers will be 
loaded every character clock, 1 = Video serializers are loaded every other character clock). 


5.9.4 Index 2-Map Mask (0x3c5) 


Map Enables. If a bit is 0, writing to the corresponding map(0-3) is disabled. 


5.9.5 Index 3-Character Map Select (0x3c5) 

If Sequencer Register index 4 bit | is 1, then the attribute byte 3 in text modes is redefined to control 
switching between character sets in alphanumeric modes. An attribute of 0 selects character map B, while 
a 1 selects character map A. 
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Description 
Reserved. 
Character Map A High Select. The Most Significant (MSB) of character map A along with 
bits 3 and 2, select the location of character map A as shown below. 
4 R/W_ | Character Map B High Select. The MSB of character map B along with bits | and 0, select 
Poe the location of character map B as shown below. 
Character Map Select A. Refer to Character Map A Select table. 
Character Map Select B. Refer to Character Map B Select table. 


Table Location 
(Maps 2 or 3) 
1° 8K 
8 


Character Map B Select 


8K 
8K 
8K 
8K 

K 
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5.9.6 Index 4-Memory Mode (0x3c5) 


Description 
Reserved. 


Chain 4 Maps. (0= Processor sequentially accesses data using map mask register, | = The 


two lower order video memory address pins (MAO,MA1) to select the map to be addressed) 
2 R/W_ | Odd/Even. Bit 3 of this register must be 0 for this bit to be effective. (0=Odd/Even Mode, 1 


Extend Memory. (0= restrict size to 16/32K, 1= allow 256K). 


[0 | R___| Reserved 
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5.10 RAMDAC Registers 


5.10.1 RAMDAC Pixel Mask (0x3c6) 


| |__| Description 
FT RAMDAC pixel mask 


The contents of this register are logically ANDed with the output of the VGA data stream before it is 
presented to the RAMDAC. The value of this register has no effect on modes other than VGA. 


5.10.2 RAMDAC Read Index /Read Status (0x3c7) 


| | Deseription 
| | | RAMDAC Read Index 
| 


RAMDAC State. 0 = a read operation is in effect, 3 = a write operation is in effect. 


When data is written to this register, it causes the CLUT to go into a ‘Read State’. It should be followed be 
three consecutive reads of 0x3c9 in order to retrieve the red, green and blue values of the CLUT. This 
index will auto increment following the completion of the last data read. Note that only the first 256 
locations of the CLUT may be accessed via this port. 


When data is read from this register, bits 1:0 indicate the read/write state of the CLUT. 


5.11 RAMDAC Write Index (0x3c8) 


| | Description 
|__| RAMDAC Write Index 


When data is written to this register, it causes the CLUT to go into a ‘Write State’. It should be followed be 
three consecutive writes of 0x3c9 in order to store the red, green and blue values of the CLUT. This index 
will auto increment following the completion of the last data write. Note that only the first 256 locations of 
the CLUT may be accessed via this port. 


5.11.1 RAMDAC Data (0x39) 


| | Deseription 
| | PT RAMDAC palette data 


This register contains the data written to the CLUT. Data in this register is either 6 bit (VGA compatible) 
or 8 bit, as determined by VGAINITO bit 2. When data is in 6 bit format, the 2 MSBs are replicated into 
the 2 LSBs to maintain full scale and linearity on the DAC. 
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6. Accessing memory in VESA modes 


VGA is restricted to only see 128K of memory through 0x0A0000. This supports baseline VGA graphics 
modes well; however, extended resolutions and video color depths in VESA modes require use of more 
memory than that allowed by the VGA standard. 


Access to the entire frame buffer is available in VESA modes through a method of re-mapping the 
0x0A0000 host memory space into part of the video memory. Memory accessed through 0x0A0000 in 
VESA modes is unaffected by the settings of the Graphics Control or Sequencer Registers. 


There are two aperture controls, one for reading memory and one for writing memory. This allows 
memory to be moved from addresses greater than 64K apart without frequently modifying the aperture 
pointers. Each aperture can point to video memory anywhere along a 32K boundary. 


Hosts View of Memory in 


VESA Modes , 

Sysieni Video Memory 

Memory (64 MBytes) 
OxOFFFFF Ox3FFFFFF 
OxOBFFFF 64K Read Aperture 
OxOAFFFF 
0x0A0000 iba 

Write Aperture 
0x0 0x0 
7. 2D 


71 2D Register Map 


Memory Base 0: Offset 0x0100000 


| Register Name | Address | Reg | Bits |R/W [| Description 


0x000(0) 
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| R/W_| 
| R/W_| 
| R/W_| 
| R/W_| 


R/W 
R/W 
R/W 
0 | RW 
R/W 
srcXY 0x05c(92) Ox17 | 28:0 | R/W_ | Starting pixel of bit source data 
Starting position for lines 
Top-most point for a polygon fill 


0x060(96 Background color 
colorFore 0x064(100) 


0x068(104) Destination width and height for blts and rectangle 
fills 
0x06c(108) Starting X and Y of destination for blts 


End point for lines 
0x070(112) | Ox1C | 31:0_| 


3 2D command mode & control bits 
RESERVED 0x074(1 16 
RESERVED 0x078(120 


1:0 
Lo [| 
re 
| RESERVED | 0x07c(124)_| OxIF [31:0 [| [Donotwrite  SCSCidzd 
= 
. 


launchArea 0x080(128) 0x20 Initiates 2D commands 
to to 
Ox0ff(255) 0x3F 
colorPattern 0x100(256) 0x40 | 3 Pattern Registers (64 entries) 
to to 
0x1 fc(508) Ox7F 
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7.2 Register Descriptions 


The 2D register set is described in the sections below. 


All 2D registers can be read, and all registers except for the status register are fully write-able. Reading a 
2D register will always return the value that will be used if a new operation is begun without writing a new 
value to that register. This value will either be the last value written to the register, or, if an operation has 
been performed since the value was written, the value after all operations have completed. 


All registers for the 2D section are unsigned unless specified otherwise. 


7.2.1 status Register 

The status register provides a way for the CPU to interrogate the graphics processor about its current state 
and FIFO availability. The status register is read only, but writing to status clears any Napalm generated 
PCI interrupts. For the definition of this register please see section XXX on PCI configuration and 
Initialization registers. 


7.2.2. command Register 
The command register sets the command mode for the 2D engine, as well as selecting a number of options. 


Bits (3:0) set the command mode for the 2D drawing engine as shown in the table below. If bit(8) is set, 
the command will be initiated as soon as the command register is written. If bit(8) is cleared, drawing will 
be initiated by a write to the launch area. For descriptions and examples of each command, see the 2D 
launch area section. 


jo Nop - wait foridle 
ji Sereento screen bit 
|2 Cd Screen to sereen stretch bit | 


Setting Bit(9) makes line drawing reversible. If this bit is set, drawing a line from point A to point B will 
result in the same pixels being drawn as drawing a line from point B to point A. 


Bits(11:10) control the value placed in dstXY after each blt or rectangle fill command is executed. If 
bit(10) is 0, dst_x is unchanged. If bit(10) is 1, dst_x gets dst_x + dst_width. If bit(11) is 0, dst_y is 
unchanged. If bit(11) is 1, dst_y gets dst_y + dst_height. 


Bit(12) controls whether lines are stippled or solid. If bit(12) is 0, lines will be a solid color. If bit(12) is 1, 
lines will either be made up of either a two color pattern using colorFore and colorBack or will be a 
transparent stipple using colorFore, as determined by the transparency bit - bit(16). 
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Bit(13) controls the format of the pattern data. If bit(13) is set to 0, the pattern must be stored in the 
destination format. If it is set to 1, the pattern will be stored as a monochrome bitmap; Pattern registers 0 
and | will be used as an 8x8x1bpp pattern, which will be expanded into the destination format using the 
colorBack and colorFore registers. Note that if Bit(13) is set, and Bit(16) is set to indicate that 
monochrome data is transparent, the pattern will be used to determine pixel transparency without regard to 
the contents of the ROP register. 


Bits(15:14) control the direction of blting during screen-to-screen copies. Note that the corner of the 
source and destination rectangles passed in the sreXY and dstXY registers will change depending on the 
biting direction. Bit(15) also controls the direction of blting for host-to-screen copies. This can be used to 
flip a pixel map so that the top span in host memory is drawn as the bottom span on the screen. Note that 
the direction bits only apply to “pure” screen to screen blits, but not to stretch blits. Also, destination and 
source color keying, along with color conversions, cannot be used with right to left blits. 


Bit(16) controls whether monochrome source bitmaps, and monochrome patterns will be transparent or 
opaque. When bit(16) is 0, source bitmaps are opaque; a 0 in the bitmap will result in colorBack being 
written to the destination. When bit(16) is 1, source bitmaps and monochrome patterns are transparent. In 
this case, a 0 in the bitmap will result in the corresponding destination pixel being left unchanged. 


The X and Y pattern offsets give the coordinates within the pattern of the pixel which corresponds to the 
destination pixel pointed to by the destination base address register. In other words, if a pattern fill is 
performed which covers the origin, pixel (0,0) in the destination pixel map will be written with the color in 
pattern pixel (x_pat_offset, y_pat_offset). 


Bit(23) controls whether the clip0 or clip1 registers will be used for clipping. When bit(31) is 0, clipping 
values from clip0Min and clip0Max will be used, when bit(31) is 1, clipping values from clip1Min and 
clip1Max will be used. 


Bits(31:24) contain ROPO, the ternary ROP that is used when colorkeying is disabled. For more 
information on ROPs, see the description of the rop register. 
Command 
rs Initiate command (1=initiate command immediately, 0 = wait for launch write) 
[9 | Reversible lines (I=reversible, 0=non-reversibley 


Increment destination y-start after blt or rectangle command (1=increment, 0=don’t 
Stipple line mode (1 = stippled lines, 0 = solid lines) 


10 

11 

12 

13 

14 X direction (0 = left to right, 1 = right to left) 

15 Y direction (0 = top to bottom, 1 = bottom to top) 
16 

23 


f13. Pattern Format (1 = monochrome, 0 = color) 


[16 | Transparent monochrome (1=transparent,0=opaque) 
123. ss Clip select (0=clip0 registers, 1 = clip] regi 
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7.2.3 commandExtra Register 


This register contains miscellaneous control bits in addition to those in the command register. 


Bits(1:0) enable colorkeying, if the bit is 0, colorkeying is disabled. Enabling source colorkeying with 
monochrome source, or in line, polyline, polygon, or rectangle modes has no effect. For further 
explanation of these bits, see the description of the colorkey registers. 


If bit(2) is set, the current command, and any following it will not proceed until the next vertical blanking 
period begins. Wait for Vsync should not be used when performing non-DMA host blts. 


If bit(3) is set, only row 0 of the pattern will be used, rather than the usual 8 pattern rows. 


Command 
| 
i ee 
i nn 


7.2.4 colorBack and colorFore Registers 


The colorBack and colorFore registers specify the foreground and background colors used in solid-fill and 
monochrome bitmap operations, and operations using a monochrome pattern. The color registers must be 
stored in the destination color format. 


The following tables shows the format of the color registers for each destination format. 


P = palette index 

R = red color channel 

G = green color channel 
B = blue color channel 


Dst Format Bits stored 
8 bpp 0000 0000 0000 0000 0000 0000 PPPP PPPP 


16 bpp 0000 0000 0000 0000 RRRR RGGG GGGB BBBB 
24 bpp 0000 0000 RRRR RRRR GGGG GGGG BBBB BBBI 


colorFore 

Bit 
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foreground color 


colorBack 
Bit Description 
background color 


7.2.5 Pattern Registers 

The pattern registers contain an 8 pixel by 8 pixel pattern. The pixels must be either in the color format of 
the destination surface, or in 1bpp (monochrome) format. The pixels are to be written to the pattern 
registers in packed format. So, only registers 0 and 1 will be used for monochrome patterns, registers 0 
through 15 will be used when the destination is 8 bpp, registers 0 through 31 will be used when the 
destination is 16 bpp. 


Pixels should be written into the pattern registers starting with the upper left-hand corner of the pattern, 
proceeding horizontally left to right, and then vertically top to bottom. The least-significant bits of 
pattern[0] should always contain pixel(0,0) of a color pattern. 


The table below give the bit position of monochrome pixels within the pattern registers. The bits are 
numbered such that bit(0) represents the Isb of a register, and bit(31) represents the msb. 


7.2.5.1 Order of pixel storage in the pattern registers for a 
monochrome pattern 


pattern(0) 
Row 0 17 |6 |5 [4 |3 [2 {1 |o | 
Row 1 }15 | 14] 13 [12/11 {109 [8 | 
Row 2 
Row 3 
pattern(1) 
Row 4 [7 6-15 [4] 3. (2.11 | o-_| 
Row 5 (15 [14] 13 | 12/11} iofo [8 | 
Row 6 
Row 7 


pattern(0-64) 
Bit 


7.2.6 srcBaseAddr and dstBaseAddr Registers 


Bits(25:0) of these registers contain the addresses of the pixels at x=0, y=0 on the source and destination 
surfaces in frame-buffer memory. Bit(31) of each register specifies whether the address points to tiled or 
linear memory. 


The srcBaseAddr register is used only for screen-to-screen blts. For host-blts, the alignment of the initial 
pixel sent from the host is determined by the x entry in the sreXY register. 
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For YUYV422 and UYVY422 surfaces, the base address must be dword aligned. Thus bits(1:0) of 
srcBaseAddr must be 0. 


SrcBaseAddr 


Description 
Source base address 


RESERVED 
Source memory is tiled 


dstBaseAddr 


Description 
)___| Destination base address 


RESERVED 


Destination memory is tiled 


7.2.7 srcSize and dstSize Registers 


These registers are used only for blts and rectangle fills. They contain the height and width in pixels of the 
source and destination rectangles. The srceSize register will only be used in Stretch-blt modes. For non- 
stretched blts, the blt source size will be the same as the blt destination size, determined by the dstSize 
register. 


srcSize 


dstSize 
Bit Destination Width 


15:13 RESERVED 
28:16 Blt Destination Height 
31:29 RESERVED 


7.2.8 srcXY and dstXY Registers 


During screen-to-screen blts, the sreXY registers sets the position from which bit data will be read. Note 
that the starting position for a blit depends on the direction of the blt as shown in the table below. For lines 
and polylines, sreXY is the starting point of the first line segment. For polygons, the sreXY should be the 
topmost vertex of the polygon - that is, the vertex with the lowest y value. If there are multiple vertices 
sharing the lowest y value, the sreXY should be set to the leftmost vertex with that y value. Reading the 
sreXY register while in polygon mode will always return the last polygon vertex that the host sent for the 
left side of the polygon. 


The values in the sreXY are signed, however for blts sreXY must contain only positive values. 
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During host-to-screen blts, only the x entry of the sreXY register is used. This entry determines the 
alignment of the initial pixel in the blt within the first dword sent from the host. For monochrome bitmaps, 
bits[4:0] are used to determine the bit position within the dword of the initial pixel. For color bitmaps, 
bits[1:0] are give the position within the dword of the first byte of pixel data. Host blts are always 
performed left-to-right (the x-direction bit in the command register is ignored), so the offset given will 
always be that of the leftmost pixel in the first span. The alignment of the initial pixel of all spans after the 
first is determined by adding the source stride (from the sreFormat register) to the alignment of the 
previous span. 


For bits, the dstXY should be the starting pixel of destination rectangle as shown in the table below. For 
line and polyline modes, the dstXY will be the endpoint of the first line segment. 


In polygon mode, the dstXY register is used to store the last vertex sent for the right side of the polygon. 

If command[8] is set when the command register is written in polygon mode, the value from sreXY will be 
copied to dstXY. If command[8] is cleared, dstXY can be written with the rightmost pixel in the top span 
of the polygon. 


Command |15:14] 
}00 | Upper Left-hand corner 


Upper Right-hand corer 
Lower Left-hand corner 
Lower Right-hand corner 


dstXY 


srcXY 
Signed x position of the first source pixel 


15:14 RESERVED 
28:16 Signed y position of the first source pixel 
31:30 RESERVED 


7.2.9 srcFormat and dstFormat Registers 
These register specify the format and strides of the source and destination surfaces 


For linear surfaces, the stride of a pixel map is the number of bytes between the starting addresses of 
adjacent scan lines. For these surfaces, the units of the stride is always bytes, regardless of the pixel 
format. 


For tiled surfaces, the stride is a tile-stride. It’s units are tiles, and only bits(6:0) are used. 


The number of bits per pixel is determined as described by the tables below. The ’32 bpp’ format contains 
24 bits of RGB, along with a byte of unused data, the ’24 bpp’ is packed 24 bit color. 
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Data coming through the host port can be byte swizzled to allow conversion between big and little endian 


formats, as selected by Bit 19 and 20 of src Format register. If both byte and word swizzling are enabled, 
the byte swizzling occurs first, followed by word swizzling. 


The source packing bits control how the stride of the source will be determined during blts. If both bits are 
zero, the stride is set by the stride entry. Otherwise, the stride is based off of the width of the bit being 
performed, as shown in the table below. The stride will equal the number of bytes in a row of the rectangle 
being blted plus as many bytes as are required to get the necessary alignment. Packed source and tiled 
surfaces are mutually exclusive - you cannot have packed source on a tiled surface. 


For YUYV422 and UYVY422 source formats, linear strides must always be a dword multiple. Thus, 
bits(1:0) of the sreFormat register must be 0. 


When necessary, the blt engine will convert source pixels to the destination format. 


When source pixels in 16bpp format are converted to 24bpp or 32bpp, color conversion is performed by 
replicating the msbs of each channel into the extra Isbs required. When pixels are converted from 32bpp 
or 24bpp formats to 16bpp, the extra Isbs are removed from each channel. When any non-32bpp format is 
converted to 32bpp, the 8msbs of each pixel (i.e. the alpha channel) are filled with zeros. 
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Destination pixel formats are stored as shown in the description of the colorFore and colorBack registers. 
RGB source formats match these, the other source formats are shown in the table below. For monochrome 
source, p0 represents the leftmost pixel on the screen and p31 represents the rightmost. For YUV formats, 
ya represents the left pixel and yb represents the pixel to the right of ya, etc. Thus, ya7 is the msb of the y 
channel for the left pixel and ya0 is the Isb of the y channel for that pixel. In the diagram, the dword with 

the lower address (which will be quadword aligned) is shown first, followed by the dword with the higher 
address. 


Source formats 


Monochrome 


p24 p25 p26 p27 p28 p29 p30 p31 pl6 pl7 pl8 pl9 p20 p21 p22 p23 p8 p9 p10 pll pl2 pl3 pl4 pl5 pd pl p2 p3 p4 p5 po p7 
UYVY 4:2:2 

b7 yb6 yb5 yb4 yb3 yb2 yb1 yb0 v7 v6 v5 v4 v3 v2 vl vO ya7 ya6 ya5 ya4 ya3 ya2 yal ya0 u7 u6 u5 u4 u3 u2 ul ud 
YUYV 4:2:2 


v7 v6 v5 v4 v3 v2 v1 v0 yb7 yb6 yb5 yb4 yb3 yb2 yb1l yb0 u7 u6 uS u4 u3 u2 ul U0 ya7 ya6 ya5 ya4 ya3 ya2 yal ya0 


Methods of color translation used for Blts 


1bpp sre 8bpp src 16bpp sre 24bpp sre 32bpp sre YUV 
sre 
8bpp dst | color direct or not supported not supported 
registers palette aes oer 
16bpp dst | color direct Isb removal | Isb removal, YUV => 
registers Shane alpha dropped | RGB 


24bpp dst | color msb direct direct, YUV => 
registers ae duplication alpha dropped | RGB 


32bpp dst | color msb rgb direct, direct YUV => 
registers ae duplication, zero alpha RGB 
zero alpha zero alpha 

srcFormat 
Bit Description 

Source Stride in bytes or tiles 

RESERVED 

Source color format: 1, 8, 16, 24, 32 bpp RGB, YUYV422, UYVY422 

Host port byte swizzle (1=enable) 

Host port word swizzle (1=enable) 

Source packing 

RESERVED 
dstFormat 

He Description 

Destination Stride in bytes or tiles 
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dstFormat Destination 
[18:16] Bpp 


srcFormat Source Format 
[19:16] 


}O bpp mono 
[5 32 bppRGB 
[8 packed 4:2:2 YUYV | 
}9 | packed 4:2:2 UYVY | 


1 


Nn 


jo Cd Use stride register__| srcFormat[13:0) 


7.2.10 clip0Min, clipOMax, clip1 Min, and clip1Max Registers 

The clip registers define the maximum and minimum x & y values of pixel that can be written in the 
destination pixel map. There are two sets of clip registers, however, only one set is used at a time, as 
determined by the clip select bit in the command register. 


Clipping is inclusive of the minimum values, and exclusive of the maximum values. Thus if the clip select 
bit is zero, only pixels with x values in the range [clipOMin_x, clipO0Max_x) and y values in the range 
[clipOMin_y, clipOMax_y) will be written. 


clipOMin 


x maximum clip when clip select is 0 


clip1 Min 
X minimum clip when clip select is 1 


15:12 RESERVED 


y minimum clip when clip select is 1 
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RESERVED 


clip1 Max 


Description 


11:0 X maximum clip when clip select is 1 


15:12 RESERVED 
27:16 y maximum clip when clip select is 1 
31:28 RESERVED 


7.2.11 colorkey Registers 
These registers define the range of colors that will be transparent when color keying is enabled. 


Different ROPs are selected for each pixel depending the result of that pixels colorkey test. A source pixel 
passes the colorkey test if it is within the inclusive range defined by the sreColorkeyMin and 
srcColorkeyMax registers. A destination pixel passes the colorkey test if it is within the inclusive range 
defined by the dstColorkeyMin and dstColorkeyMax registers. 


For Pixels with 8bpp formats, the color indices are compared directly. For pixels with 16, 24, or 32bpp 
formats, each color channel (R, G, and B) is compared separately, and each channel must pass for the 
colorkey test to be passed. In the 32bpp format, the upper 8 bits are ignored during colorkey testing. 
Source colorkeying cannot be enabled if the source format is 1 bpp. 


If colorkeying is disabled for the source or destination surfaces, that colorkey test is failed. 


For further information on ROP selection by the colorkey test results, see the description of the ROP 
register. 


The colorkey test uses the following formula: 
pass = (((color>=colorkey_min) && (color<=colorkey_max)) && colorkey_enable) 


srcColorkeyMin 


minimum color key value for source pixels 
31:24 RESERVED 


srcColorkeyMax 


dstColorkeyMin 


dstColorkeyMax 


maximum color key value for destination pixels 
31:24 RESERVED 
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7.2.12 rop Register 

This is a set of ternary ROPs used to determine how the source, destination, and pattern pixels will be 
combined. The default ROP, ROPO is stored in the command register. Which of the four ROPs will be 
used is determined on a per-pixel basis, based on the results of the source and destination colorkey tests, as 
shown in the following table: 


Source Destination Color | ROP 
Color Key Key Test 
Test 


rop 


Bit Description 


ROP 1 
ROP 2 
23:16 ROP 3 


7.2.13 lineStyle register 
The lineStyle register specifies how lines will be drawn. 


The bit pattern used for line stippling can be set to repeat every 1-32 bits, as set by the bit-mask size part of 
this register. The bit-mask size entry gives the number of bits *minus one* that will be used from the 
lineStipple register. Thus, if you want to use 2 bits to represent a dashed line, you would set the bit-mask 
size to 1. 


Each bit from the lineStipple register will determine the color or transparency of from 1-256 pixels. The 
repeat count determines the number of pixels along the line that will be drawn (or skipped) for each bit in 
the line pattern register. The number of pixels associated with each bit of the line pattern *minus one* 
must be written to the repeat count entry. 


The start position give the offset within the line pattern register for the first pixel drawn in a line. It 
consists of an integer index of the current bit in the line pattern, and a fractional offset that will determine 
the number of pixels that will be drawn using that bit of the pattern. The number of pixels drawn using the 
initial bit in the line pattern will equal the repeat count (i.e. the repeat count entry+1) minus the fractional 
part of the start position. The bit positions within the lineStipple registers are numbered starting with the 
Isb at 0, going up to the msb at 31. 


It is illegal to set the integer part of the stipple position to be greater than the bit-mask size. It is illegal to 
set the fractional part to be greater than the repeat count. If either part of the stipple position is too large, 
the behavior of the line drawing engine is undefined. 


Writing the lineStyle register will cause the stipple position to be loaded from the register. If the lineStyle 
register is not written to between the execution of two line commands, the stipple position at the start of the 
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new line will be whatever if was after the completion of the last line. If the lineStyle register is read while 
the 2D engine is idle, the stipple position read will always be that which will be used in the next line 
operation - thus, if the lineStyle register has been written since the last stippled line was drawn the value 
written will be returned, otherwise the value that remained after the last stippled line will returned. 
Reading the lineStyle register while the 2D engine is not idle will return an indeterminable value for the 
stipple position. 


In the following examples,. ‘x’ represents a pixel colored with colorFore, ‘o’ represents a pixel colored 
with colorBack or that is transparent. ‘ S_’” Shows that the line engine is starting at bit 0 in the lineStipple 
register. ‘_’ shows that the line engine is using a new bit from the lineStipple register. 


7.2.13.1 Example 


Say the bit-mask size is set to 6 (thus, the entry in the register is 5) and the line pattern is: 
lineStipple <= 010111b 


The pixel pattern that will be repeated is: 


repeat_count repeating pixel pattern 

1 xXxXxoxoSxxxoxo 

2 XX XX XX 00 Xx 00 S XxX XX XX 00 XX 00 

3 XXX XXX XXX OOO XXX 000 S XXX XXX XXX 000 XXX 000 


7.2.13.2 Example 


Say the repeat count is 5 (the register entry is 4), the integer part of the start position is 7, and the fractional 
part of the start position is 2. The color of the first 3 pixels drawn for the line will be determined by bit 7 
in the line pattern register, the next 5 pixels will be determined by bit 8, and so on. 


lineStyle <= 07020904h 
lineStipple <= 1010110111b 


pixels generated, where x=colorFore and o=colorBack: 


XXX _OO000 XXXXxX S XXXXX XXXXX XXXXX 00000 XXXXX XXXXX 00000 XXXXX_ 00000 XxXxxx S 


7.2.13.3 Pseudo code for line pixel generation 
Here is the pseudo-code for determining the color of pixels generated by the line engine: 


<bit_position> = <start_position_integer> 
<pixel_position> = <start_position_fraction> 


while (<need_another_pixel>) { 
if (<line_pattern> & (1 << <bit_position>) ) { 
<new_pixel_color> = <colorFore> 
} else { 
if (<transparent>) { 
<new_pixel_color> = <transparent> 


} else { 
<new_pixel_color> = <colorBack> 
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} 


if (<pixel_position> == <repeat_count> ) { 
<pixel_position> = 0 
if (<bit_position> == <bit_mask_size>) { 
<bit_position> = 0; 
} else { 
<bit_position> = <bit_position> + 1 


} 
} else { 
<pixel_position> = <pixel_position> +1 


lineStyle 


Start position - fractional part 
Start position - integer part 
RESERVED 


7.2.14 lineStipple Register 


The line bit-mask register contains a mask that determines how lines will be drawn. Bits that are ones will 
be drawn with the color in the colorFore register. Bits that are zeros will be filled with the color in the 
colorBack register, or will not be filled, depending on the ‘transparent’ bit in the command register. The 
pattern in the bit mask can be set to repeat every 1-32 bits, as set by the bit-mask size part of the line style 
register. If the bit-mask size is set to less than 31, some of the bits of the line bit-mask will not be used, 
starting with the most-significant bit. For example, if the bit-mask size is set to 7, bits 0-7 of the 
lineStipple register will contain the line bit-mask. 


lineStipple 


7.2.15 bresenhamError registers 


These registers allows the user to specify the initial Bresenham error terms used when performing line 
drawing, polygon drawing, and stretch blts. The Bresenham error terms are signed values. 


Bit 31 of each registers determines whether or not the error term given in the lower bits will be used. If 
this bit is 0, the line and stretch blt engines will generate the initial error term automatically. If the bit is set 
to 1, the error term given in bits 16-0 will be used. If a bresenham error register is used, the register should 
be written with bit[31] set to 0 after completion of the operation, so that subsequent operations will not be 
affected. 
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bresError0 can be used to set the initial error value for lines, for the left edge of a polygon, and for blt 
stretching along the y-axis. 


bresError1 can be used to set the initial error value for the right edge of a polygon, and for blt stretching 
along the x-axis. 


bresError0 


Bit Description 
Signed Bresenham error term for stretch blt y, lines, and left polygon edges 
RESERVED 
Use the error term given in bits 16-0 


Bit Description 
Signed Bresenham error term for stretch blt x and right polygon edges 
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7.3 Launch Area 


7.3.1 Screen-to-screen BIt Mode 

Writing the launch area while in screen-to-screen blt mode results in a rectangle being copied from one 
area of display memory to another. The position of the source rectangle is given by the write to the launch 
area. The write to the launch area will be used to fill the sreXY register. 


screenBltLaunch 
Description 
X position of the source rectangle 


RESERVED 
Y position of the source rectangle 


31:29 RESERVED 


7.3.2 Screen-to-screen Stretch Blt Mode 

Writing the launch area while in screen-to-screen blt mode results in pixels being copied from rectangle in 
display memory to another of a different size. The write to the launch area will be used to fill the sreXY 
register. The x and y direction bits do not apply to stretch blits. I.e., only top-down, left-to-right stretch 
blits can be done. 


stretchBltLaunch 
X position of the source rectangle 


15:13 RESERVED 
28:16 Y position of the source rectangle 
31:29 RESERVED 


7.3.3. Host-to-screen BIt Mode 


In host-to-screen blt mode, writes to the launch area should contain packed pixels to be used as source data. 
When performing a host-to-screen bit, the blt engine does not generate source addresses. However, it is 
still necessary for the driver to specify the sreFormat, in order for the blt engine to determine how the 
source data is packed. The driver must also write the sreXY register in order to specify the first byte or bit 
to use from the first dword. In monochrome source mode, the 5 Isbs will specify the initial bit. In all other 
modes, the 2 Isbs of sreXY will specify the initial byte of the initial span. The alignment of the first pixel 
of each span after the first is determined by adding the source stride (from the sreFormat register) to the 
alignment of the previous span. 


If more data is written to the launch area than is required for the host blt specified, the extra data will be 
discarded, or may be used in the following host bit, if it is requested while the 2D is operating on the first 
hbit. If too little data is written to the launch area, the hblt will be aborted, and pixels on an incomplete 
span at the end of the host blt may or may not be drawn. 


7.3.3.1 Host Bit Example 1 


In this example, the driver is drawing text to a 1024x768x16bpp screen using monochrome bitmaps of 
various widths. The monochrome data is packed, with each row byte aligned. First, it sets up the 
necessary registers before giving the data specific to the first blt: 
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colorBack <= the background color 
colorFore <= the foreground color 
dstXY <= the starting position of the first character 
dstBaseAddr <= base address of the primary surface 
clip0Min <= 0x00000000 
clipOMax <= 0OxFFFFFFFF 
command <= SRC_COPY || HOST BLT MODE = 0xCC000003 
dstFormat <= 0x00030800 
srcFormat <= 0x00400000 


The command mode is set to host-to-screen blt, with all other features disabled. Since colorkeying is 
disabled, only ROPO is needed. The format register sets the host format to unswizzled monochrome, using 
byte-packing. This means that the stride will not have to be set for each bit, but will be set to the number 
of bytes required to store the number of pixels in the source width (Since this is not a stretch blt, the source 
width equals the destination width, as set later in the dstSize register). The clip registers are set such that 
the results will not be clipped. Although this is a host to screen blt, the sreXY register must be set in order 
to specify the initial alignment of the bitmask. For this example, the source data begins with the Isb of the 
first dword of host data, so the sreXY register is set to zero. 


Now, the driver is ready to start the first blt. It will blt a 11x7 pixel character. 
dstSize <= 0x0007000B 
sreXY <= 0x00000000 
launch <= 0xc0608020 
launch <= 0xC460C060 
launch <= 0x3B806ECO 
launch <= 0x00001100 


7.3.3.2 Host Bit Example 2 


In this example, the driver is drawing a pixel map 
colorBack <= the background color 
colorFore <= the foreground color 
dstXY <= the starting position of the first character 
clip0Min <= 0x00000000 
clipOMax <= 0xFFFFFFFF 
command <= SRC_COPY || HOST BLT MODE = 0xCC000003 
srcFormat <= 0x00240000 


The command mode is set to host-to-screen bit, with all other features disabled. Since colorkeying is 
disabled, only ROPO is needed. The format register sets the host format to unswizzled monochrome, using 
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byte-packing. This means that the stride will not have to be set for each bit, but will be set to the number 
of bytes required to store the number of pixels in the source width (Since this is not a stretch blt, the source 
width equals the destination width, as set later in the dstSize register). The clip registers are set such that 
the results will not be clipped. Although this is a host to screen blt, the sreXY register must be set in order 
to specify the initial alignment of the bitmask. For this example, the source data begins with the Isb of the 
first dword of host data, so the sreXY register is set to zero. 


Now, the driver is ready to start the first blt. It will blt a 11x7 pixel character. 
dstXY <= 0x0007000B 
sreXY <= 0x00000000 
launch <= |“ 2 rows 
launch <= 2™ 2 rows 
launch <= 3" 2 rows 


launch <= last row 


hostBltLaunch 
Bit 
Source pixel data 


7.3.4 Host-to-screen Stretch Blt Mode 

Writing the launch area while in host-to-screen blt mode results in the pixels written to the launch area 
being stretched onto the destination rectangle. Pixel data for Host-to-screen stretch blts is written just as 
for non-stretched host-to-screen blts, except when the destination height differs from the source height. In 
this case, the host must replicate or decimate the source spans to match the number of destinations spans 
required. 


hostStretchLaunch 


7.3.5 Rectangle Fill Mode 


Rectangle fill mode is similar to screen-to-screen blt mode, but in this mode, the colorFore register is used 
as source data rather than data from display memory. The size of the rectangle is determined by the 
dstSize register. The write to the launch area gives the position of the destination rectangle, which is used 
to fill the dstXY register. 


rectFillLaunch 


X position of the destination rectangle 


15:13 RESERVED 
28:16 Y position of the destination rectangle 
31:29 RESERVED 
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7.3.6 Line Mode 
Writing the launch area while in line mode will write the launch data to the dstXY register and draw a line 


from sreXY to dstXY. After the line has been drawn, dstXY is copied to sreXY. In line mode, all pixels 
in the line will be drawn (as specified by the line style register), including both the start and endpoint. 


The ROP used for lines can use the pattern and the destination, but not source data. colorFore will be used 
in the ROP in place of source data. Source colorkeying must be turned off, destination colorkeying is 
allowed. 


7.3.6.1 Line drawing example 
srceXY <= 0x00020003 // line start-point = (3, 2) 
lineStipple <= 0x00000006 // bit mask is 110 binary 
lineStyle <= 0x02010202 // start position = 2 1/3, repeat count = 2, bit-mask size=2 


colorBack <= BLACK 

colorFore <= GREY 

command <= LINE MODE || OPAQUE 

launch <= 0x000c0016 // line end-point = (22,12) 


The line drawn will appear as shown below: 


Origin 


Figure | 


lineLaunch 


28:16 Y position of the line endpoint 
31:29 RESERVED 
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7.3.7 Polyline Mode 

Writing the launch area while in line mode will write the launch data to the dstXY register and draw a line 
from srcXY to dstXY. After the line has been drawn, dstXY is copied to sreXY. In polyline mode, the 


endpoint of the line (the pixel at dstX Y) will not be written. This ensures that each pixel in a non- 
overlapping polyline will be written only once. 


The ROP used for lines can use the pattern and the destination, but not source data. colorFore will be used 
in the ROP in place of source data. Source colorkeying must be turned off, destination colorkeying is 
allowed. 


polylineLaunch 


Description 
X position of the line endpoint 


15:13 RESERVED 
28:16 Y position of the line endpoint 
31:29 RESERVED 


7.3.8 Polygon Fill Mode 


The polygon fill mode can be used to draw simple polygons. A polygon may be drawn using the method 
described below if no horizontal span intersects more than two non-horizontal polygon edges. Polygons 
are drawn by first determining the top vertex - that is the vertex with the lowest y coordinate. The 
coordinates of this vertex should be written to the sreXY register. If multiple vertices share the lowest y 
coordinate, any vertex with the lowest y coordinate may be used as the starting point. If command[8] is 
set when the command register is written when command|[3:0] indicates polygon mode, the value in the 
sreXY register will be copied to the dstXY register. The value in the sreXY register determines the 
starting point for the left side of the polygon, while the value in the dstXY register determines the starting 
point for the right side of the polygon. If bit[8] of the command register is not set, the starting position of 
the right side of the polygon can be set by writing to the dstXY register. 


Once the starting vertex is set, as well as the desired colors, ROP, pattern, and options for the polygon fill, 
the polygon can be drawn by writing polygon vertices to the launch area. When multiple vertices share the 
lowest y coordinate, the starting vertex chosen will determine which of those vertices are on the ‘right’ 
edge of the polygon and which are on the ‘left’ edge. Pixels with the same y value as the starting point are 
on the left edge if they are to the left of the starting point. 


For optimum performance, software should determine the leftmost and rightmost of all vertices that share 
the lowest y coordinate. The coordinates of the leftmost vertex should be written to sreXY and the 
coordinates of the rightmost vertex should be written to dstXY. When the command register is written, 
command[8] (the ‘start command’ bit) should be low. 


In Polygon fill mode, polygon vertices should be written to the launch area in order of increasing y value. 
Whenever 2 vertices share the same y value, the leftmost vertex *must* be written first. The driver should 
keep track of the last y value sent for the left and right sides. If the y value for the last vertex sent for the 
left side is *less than or equal to* the last y value sent for the right side, the next vertex on the left side 
should be written to the launch area. Otherwise, the next vertex for the right side should be written to the 
launch area. 


The ROP used for filling polygons can use the pattern and the destination, but not source data. colorFore 
will be used in the ROP in place of source data. Source colorkeying must be turned off, destination 
colorkeying is allowed. 
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Pixels that are on the line that forms the left edge of the polygon will be drawn. Pixels that fall on the line 
that forms the right edge of the polygon will not be drawn. For Horizontal edges, pixels on a horizontal 
polygon edge that is on the ‘top’ of the polygon (i.e. above the edge is outside the polygon and below the 
edge is inside the polygon) will be drawn, while pixels on a horizontal polygon edge that is on the bottom 
of the polygon will not be drawn. 


7.3.8.1 Polygon drawing example 


As an example of polygon drawing, say we are drawing the polygon shown in figure 2. Traversing the 
vertex list in counterclockwise order gives the following list of vertices: 


(4,1) (2,4) (3, 6) (1, 6) (2,8) (5, 11) (8,8) (13,8) (11,6) (11,3) (10,1) 


Figures 2a through 2m show the steps in drawing the polygon. Filled circles are vertices of the left 
polygon edge. Open circles are vertices of the right polygon edge. Pixels that are drawn at the end of each 
step are shaded in the figures. 


The polygon engine keeps track of four vertices at a time. The top vertex of the current left polygon edge 
(LO), the bottom vertex of the current left polygon edge (L1), the top vertex of the current right polygon 
edge (RO), and the bottom vertex of the current right polygon edge (R1). The values of these variables at 
each step in drawing the polygon are shown in the figures. The arrows in the figures indicate when a 
variable changes between the start of the step and the end of pixel filling for that step. 


Figure 2 
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First, all required registers must be written, including the dstFormat register to specify the drawing 
surface, color or pattern registers, and the command register. Write the coordinates of the starting vertex 
(4, 1) to the sreXY register: 


srcXY <= 0x00010004 
command <= POLYGON MODE || INITIATE COMMAND 


LO © RO 
L1 R1 


Figure 2a 


Rl.y>=L1.y, so we have to write the next vertex for the left edge (2, 4): 
launch <= 0x00040002 


L1 


Figure 2b 


R1.y<L1.y, so we write the next vertex for the right edge (10, 2). The drawing engine now has edges for 
both the left and right edges. So, it will draw all spans up to min(R1.y, Ll.y). Because R1.y=R0.y, no 
pixels will be drawn, but RO will be updated to vertex R1: 


launch <= 0x0001000a 
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LO Ps RO——W¥——_+ RO 
R1 


L1 


Figure 2c 
Rl.y<L1.y, so we again write the next vertex on the right polygon edge (11, 3). Pixels on all spans from 


max(L0.y, RO.y) to min(L1.y, R1l.y)-1 will be drawn, as shown below. Because R1.y<L1.y, RO is updated 
toRI1. 


launch <= 0x0003000b 


LO RO 


L1 


Figure 2d 


R1.y<L1.y, so we write the next vertex on the right edge (11, 6). Again, pixels on all spans from max(L0.y, 
RO.y) to min(L1.y, R1.y)-1 will be drawn. This time R1.y>L1.y, however, so LO is updated to L1. 


launch <= 0x0006000b 
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LO 
RO 
L1 ° 


R1 


Figure 2e 


Rl.y>=L1.y, so we write the next vertex on the left edge (3, 6). L1.y=R1.y, so RO is updated to R1 and LO 
is updated to LI. 


launch <= 0x00060003 


® O 
0 RO 
LO 
\ © 
LO ° 0 RO 
L1 R1 
Figure 2f 


Rl.y>=L1.y, so we write the next vertex on the left edge (1, 6). L1.y=R1.y, so RO is updated to R1 and LO 
is updated to L1. R1 did not change, so updating RO to R1 has no effect. 


launch <= 0x00060001 
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mat Wad © RO 
L1 R141 


Figure 2g 


Rl.y>=L1.y, so we again write the next vertex on the left edge (2, 8). Ll.y>R1.y, so RO is updated to R1, 
again with no effect. 


launch <= 0x00080002 


LO RO 
R1 


L1 


Figure 2h 


R1.y<L1.y, so we write the next vertex on the right edge (11, 8). L1l.y=R1-y, so RO is updated to R1, and 
LO is updated to L1. 
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launch <= 0x0008000b 


@ O 
@ 
@ 
LO e ® 0 RO 
LO e O RO 
L1 R1 
Figure 21 


Rl.y>=L1.y, so we write the next vertex on the left edge (5, 11). Ll.y>R1.y, so RO is updated to R1. 
launch <= 0x000b0005 
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LO RO 
R1 


L1 


Figure 2) 


R1.y<L1.y, so we write the next vertex on the right edge (8, 8). L1.y>R1.y, so RO is updated to R1, but no 


pixels are drawn. 


launch <= 0x00080008 
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O O 
O 
O 
@ 6 O 
R1 
L1 © 
Figure 2k 


Rl1.y<L1.y, so we write the next vertex on the right edge. This is the final vertex in the polygon, which 
doesn’t have a horizontal span at the bottom, so this vertex is the same as the last vertex for the left edge (5, 
11). Ll.y=RL1.y, so RO is updated to R1, and LO is updated to L1. No pixels on the final span are drawn 
(this would be true even if L1.x did not equal R1.x). Ifthe launch area is written again before any registers 
are written the polygon engine will begin a new polygon starting at (5,11). 


launch <= 0x000b0005 
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LO RO 


Figure 2m 


polygonLaunch 


Description 
X position of a polygon vertex 


15:13 RESERVED 
28:16 
31:29 RESERVED 


7.4 Miscellaneous 2D 


7.4.1 Write Sgram/Sdram Mode Register 
Executing this command causes the value in colorFore[ 13:0] to be set as the sgram/sdram mode register via 
a special bus cycle in the memory controller. 


SGRAM mode register 
|CASlateney 


CAS latency 
[9 | write burst length (Q=burst, I=single bit), 
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The colorFore register is mapped to the Sgram/Sdram pins as follows: 


7.4.2 Write Sgram Color Register 


Executing this command causes the value in colorFore[3 1:0] to be set as the sgram color register via a 
special bus cycle in the memory controller. Since Napalm has a 128-bit wide bus, the register is replicated 
across the four sets of sgram memories. 


7.4.3. Write Sgram Mask Register 


Executing this command causes the value in colorFore [31:0] to be set as the sgram mask register via a 
special bus cycle in the memory controller. Since Napalm has a 128-bit wide bus, the register is replicated 
across the four sets of sgram memories. 


8. 3D Memory Mapped Register Set 
A 4Mbyte (22-bit) FBI memory mapped register address is divided into the following fields: 


a a a 


The chip field selects one or more of the Napalm units (FBI and/or TREX) to be accessed. Each bit in the 
chip field selects one chip for writing, with FBI controlled by the Isb of the chip field, and TREX#2 
controlled by the msb of the chip field. Note the chip field value of 0x0 selects all chips. The following 
table shows the chip field mappings: 


| Chip Field | Accessed 
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By utilizing the different chip fields, software can precisely control the data presented to individual chips 
which compose the Napalm graphics subsystem. Note that for reads, the chip field is ignored, and read 
data is always read from FBI. The register field selects the register to be accessed from the table below. 
All accesses to the memory mapped registers must be 32-bit accesses. No byte (8-bit) or halfword (16-bit) 
accesses are allowed to the memory mapped registers, so the byte (2-bit) field of all memory mapped 
register accesses must be 0x0. As a result, to modify individual bits of a 32-bit register, the entire 32-bit 
word must be written with valid bits in all positions. 


The table below shows the Napalm register set. The register set shown below is the address map when 
triangle registers address aliasing (remapping) is disabled(fbiinit3(0)=0). When The chip column 
illustrates which registers are stored in which chips. For the registers which are stored in TREX, the % 
symbol specifies that the register is unconditionally written to TREX regardless of the chip address. 
Similarly, the * symbol specifies that the register is only written to a given TREX if specified in the chip 
address. The R/W column illustrates the read/write status of individual registers. Reading from a register 
which is “write only” returns undefined data. Also, reading from a register that is TREX specific returns 
undefined data.. Reads from all other memory mapped registers only contain valid data in the bits stored 
by the registers, and undefined/reserved bits in a given register must be masked by software. The syne 
column indicates whether the graphics processor must wait for the current command to finish before 
loading a particular register from the FIFO. A “yes” in the syne column means the graphics processor will 
flush the data pipeline before loading the register -- this will result in a small performance degradation 
when compared to those registers which do not need synchronization. The FIFO column indicates 
whether a write to a particular register will be pushed into the PCI bus FIFO. Care must be taken when 
writing to those registers not pushed into the FIFO in order to prevent race conditions between FIFOed and 
non-FIFOed registers. Also note that reads are not pushed into the PCI bus FIFO, and reading FIFOed 
registers will return the current value of the register, irrespective of pending writes to the register present in 
the FIFO. 


Memory Base 0: Offset 0x0200000 


Register Name Address Reg | Bits Chip R/ Sync? Description 
Num WwW /Fifo? 


[FBI | RU No/n/a__| NapalmStaus 
0x010(16) 
0x014(20) 
0x018(24) 
0x01¢(28 
Pp a oe eee 
0x020(32) 
0x024(36 
0x028(40) 
0x02¢(44) 
0x030(48 
0x034(52) 
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0x03c(60 F 
dRdX 0x040(64) 

dGdX 0x044(68 

dBdX 0x048(72) 


dZdx 0x04c(76) 


= 


= 


FFE 


| dRdY i 0x060(96 Ox 


dRdY 


dAdY 0x070(112) 
dSdY 0x074(1 16) 


ae 


0x094(148) 
0x098(152) FBI+TRE 
0x09e(156) 


0x2D_| 31:0 

TREX” 
Ox2F 
re ee 

0x30 

Ox32__| 31: 
Ox3 


Ox3 
Ox3 
Ox3 
es eee 


W 


SES SSS SSS SSS See Ee 


FBI+TREX” 
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No / Yes 
No / Yes 


Starting T/W parameter (14.18 format) 
Starting 1/W parameter (2.30 format) 
Change in Red with respect to X (12.12 format) 


No / Yes 
No / Yes 
No / Yes 

/ Yes 


Change in Green with respect to X (12.12 format) 
Change in Blue with respect to X (12.12 format) 


Change in Z with respect to X (20.12 
format) 


No 
No 
No 
No 


/ Yes 
/ Yes 
/ Yes 
/ Yes 


Po 


No / Yes 
No / Yes 
No / Yes 


No / Yes 


format) 
No/ Yes 
No / Yes 
No / Yes 


No / Yes 


Change in 1/W with respect to Y (2.30 format) 
No / Yes 


Execute TRIANGLE command (floating point) 


No 
No 
No 
No 
No 
No 


/ Yes 
/ Yes 
/ Yes 
/ Yes 
/ Yes 
/ Yes 


Vertex C y-coordinate location (floating point) 
PO 

Starting 1/W parameter (floating point) 
S| 


Change in S/W with respect to Y (14.18 format) 
Change in T/W with respect to Y (14.18 format) 


No 
No 
No 
No 
No 
No 
No 
No 


/ Yes 
/ Yes 
/ Yes 
/ Yes 
/ Yes 
/ Yes 
/ Yes 
/ Yes 


No / Yes 
No / Yes 
No / Yes 
No / Yes 
No / Yes 
No / Yes 
No / Yes 
No / Yes 


Change in Red with respect to Y (floating point) 


No 
No 
No 
No 
No 
No 
No 
No 


/ Yes 
/ Yes 
/ Yes 
/ Yes 
/ Yes 
/ Yes 
/ Yes 
/ Yes 


Change in 1/W with respect to Y (floating point 


Execute TRIANGLE command (floating point) 


Change in Green with respect to Y (floating point) 


No / Yes 


Ferrer Prerevre| Ferrer Brera! Fpl cr FrvP| SFr’ PF 
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0x104(260)__|_Ox41 
x108(264) 
x1 0c(268) 
x110(272) 
x1 14(276) 
0x118(280) 
Ox 10(284) 
a 
0x120(288) 
0x124(292) 
x128(296) 
x12¢(300) 
x130(304) 
134308) 
x138(312) 
x13c(316) 
140320) 
x144(324) 
x148(328) 
ae es 
0x14c(332) 
0x150(336) 
154340) 
n/a 


fbzColorPath 
fogMode 
alphaMode 
fbzMode 
lfbMode 
clipLeftRight 
clipTopBottom 


So 


nopCMD 
fastfillCMD 
swapbufferCMD 
fogColor 
zaColor 
chromaKe 
chromaRange 
userIntrCMD 
stipple 

color0 


S 
es] 
wo 
ha 


i=) 


So 


colorl 


ie] 


eo feok fase bes 
Wi | 
= re a 


fbiPixelsIn 
fbiChromaFail 
fbiZfuncFail 
fbiA funcFail 
fbiPixelsOut 


i=) 


0x158(344) Pixel Counter (Number pixels failed Alpha test) 
0x15c(348) | na Pixel Counter (Number pixels drawn) 

fogTable 0x160(352) 0x58 31:0 Yes / Yes Fog Table 

to to 

Ox1dc(476) 0x77 


i. — 

Oxlec(492 | ) | FBI ss | R/W | Yes/Yes_| Color Buffer Base Address 
0x1f0(496) 

0x 1£4(500) 

x1£8(504) 


colBufferAddr 
colBufferStride 
auxBufferAddr 
auxBufferStride 


las 


| colBufferAddr | 
| colBufferStride | 
| auxBufferStride | 
i @ 
clipLeftRight! 0x200(512) 
clipTopBottom1 0x204(516) 
ier a 2 || 0x83 | 31:0 | ‘i EE 
E T 7 | | i | | —————e 
swapPending 0x24c(588) Swap buffer pending 
leftOverlayBuf 0x250(592) Left Overlay address 
rightOverlayBuf 0x254(596) 
fbiSwapHistory 0x258(600) 
fbiTrianglesOut 0x25c(604) 
0x260(608) Triangle setup mode 
0x264(612) Triangle setup X 
0x268(616) | FBI“TREX* | W__| No/Yes__| Triangle setup Y 
sARGB 0x26c(620 ig | W__| No/Yes _| Triangle setup Alpha, Red, Green, Blue 
0x270(624) Triangle setup Red value 
0x274(628) Triangle setup Green value 
0x278(632) Triangle setup Blue value 
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No 
No 
0 
ty) 
oO 


N 
N 
ee ee eae a a eee eee) 
reserved 0x2a8(680) OxAA | n/a n/a 
to to 
es ee rr eee 
x" 
a | 


TREX Hardware Initialization (register 1 
necTableO 0x324(804) OxC9 | 31:0 | TREX Yes / Yes Narrow Channel Compression Table 0 (12 entries) 
to to or 
0x350(848) OxD4 | 26:0 


necTablel 0x354(852) OxDS | 31:0 
to tp or 


0x380(896) OxEO | 26:0 


reserved 0x384(900) OxE1 n/a 
to to 


Ox3fe(1020)_| OxFF 


The triangle parameter registers are aliased to a different address mapping to improve PCI bus throughput. 
The upper bit of the wrap field in the pci address is 0x1 (pei_ad[21]=1), the following table shows the 
addresses for the triangle parameter registers. 


Register Name Address Reg | Bits Chip R/ Sync? Description 
Num Ww /Fifo? 


FBIVTREX” 


pO 
Starting Green parameter (12.12 format 
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0x038(56) FBI 


FBI 
FBI 
FBI 
TREX’ 
TREX" 
TREX’ 
TREX’ 
TREX’ 
TREX’ 


Pee 
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N 


N 
N 
N 
N 
N 
N 
N 


N 
N 


o/ Yes 


0x090(144) FBI+TREX” No / Yes 
0x094(148) FBI+TREX” No / Yes 
0x098(152 FBI+TREX” No / Yes 


o/ Yes 
o/ Yes 
o/ Yes 
o/ Yes 
o/ Yes 
o/ Yes 
o/ Yes 
o/ Yes 
o/ Yes 
o/ Yes 
o/ Yes 
o/ Yes 
o/ Yes 
o/ Yes 
o/ Yes 
o/ Yes 
o/ Yes 
o/ Yes 
o/ Yes 
o/ Yes 
o/ Yes 


ftriangleCMD 0x100(256) FBI+TREX” No / Yes 


tarting Alpha parameter (12.12 format) 


tarting S/W parameter (14.18 format) 


hange in 1/W with respect to Y (2.30 format) 
Execute TRIANGLE command (sign bit) 
Vertex A x-coordinate location (floating point) 
Vertex A y-coordinate location (floating point 
Vertex B x-coordinate location (floating point) 


Vertex C y-coordinate location (floating point) 
tarting Red parameter (floating point) 


Vertex B y-coordinate location (floating point) 


S 


Vertex C x-coordinate location (floating point 


hange in Alpha with respect to X (floating point) 


C 
Yee ee ee 


Execute TRIANGLE command (floating point) 
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8.1 status Register 


The status register provides a way for the CPU to interrogate the graphics processor about its current state 
and FIFO availability. The status register is read only, but writing to status clears any Napalm generated 
PCI interrupts. 


Description 
PCI FIFO freespace =FIFO empty). Default is f 


(O=engine idle, | 


BI graphi i 
TREX b , 1=engine busy) 


}9 | Napalm busy (0=idle, 1=busy) 


2D busy (0=idle, 1=busy) 


Bits(5:0) show the number of entries available in the internal host FIFO. The internal host FIFO is 64 
entries deep. The FIFO is empty when bits(5:0)=0x3f. Bit(6) is the state of the monitor vertical retrace 
signal, and is used to determine when the monitor is being refreshed. Bit(7) of status is used to determine 
if the graphics engine of FBI is active. Note that bit(7) only determines if the graphics engine of FBI is 
busy -- it does not include information as to the status of the internal PCI FIFOs. Bit(8) of status is used to 
determine if TREX is busy. Note that bit(8) of status is set if any unit in TREX is not idle -- this includes 
the graphics engine and all internal TREX FIFOs. Bit(9) of status determines if all units in the Napalm 
system (including graphics engines, FIFOs, etc.) are idle. Bit(9) is set when any internal unit in Napalm is 
active (e.g. graphics is being rendered or any FIFO is not empty). When the Memory FIFO is enabled, 
bits(27:12) show the number of entries available in the Memory FIFO. Depending upon the amount of 
frame buffer memory available, a maximum of 65,536 entries may be stored in the Memory FIFO. The 
Memory FIFO is empty when bits(27:12)=Oxffff. Bits (30:28) of status track the number of outstanding 
SWAPBUFFER commands. When a SWAPBUFFER command is received from the host cpu, bits (30:28) 
are incremented -- when a SWAPBUFFER command completes, bits (30:28) are decremented. Bit(31) of 
status is used to monitor the status of the PCI interrupt signal. If Napalm generates a vertical retrace 
interrupt (as defined in pcilnterrupt), bit(31) is set and the PCI interrupt signal line is activated to generate 
a hardware interrupt. An interrupt is cleared by writing to status with “dont-care” data. NOTE THAT 
BIT(31) IS CURRENTLY NOT IMPLEMENTED IN HARDWARE, AND WILL ALWAYS RETURN 0X0. 


8.2 intrCtrl Register 


The intrCtrl register controls the interrupt capabilities of Napalm. Bits 1:0 enable video horizontal sync 
signal generation of interrupts. Generated horizontal sync interrupts are detected by the CPU by reading 
bits 7:6 of intrCtrl. Bits 3:2 enable video vertical sync signal generation of interrupts. Generated vertical 
sync interrupts are detected by the CPU by reading bits 9:8 of intrCtrl. Bit 4 of intrCtrl enables 
generation of interrupts when the frontend PCI FIFO is full. Generated PCI FIFO Full interrupts are 
detected by the CPU by reading bit 10 of intrCtrl. PCI FIFO full interrupts are genered when intrCtrl bit 
4 is set and the number of free entries in the frontend PCI FIFO drops below the value specified in fbiInit0 
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bits(10:6). Bit 5 of intrCtrl enables the user interrupt command USERINTERRUPT generation of 


interrupts. Generated user interrupts are detected by the CPU by reading bit 11 of intrCtrl. The tag 
associated with a generated user interrupt is stored in bits 19:12 of intrCtrl. 


Generated interrupts are cleared by writing a 0 to the bit signaling a particular interrupt was generated and 
writing a | to interCtrl bit(31). For example, a PCI FIFO full generated interrupt is cleared by writing a 0 
to bit 10 of intrCtrl, and a generated user interrupt is cleared by writing a 0 to bit 11 of intrCtrl. For both 
cases, bit 31 of intrCtrl must be written with the value | to clear the external PCI interrupt. Care must be 
taken when clearing interrupts not to accidentally overwrite the interrupt mask bits (bits 5:0) of intrCtrl) 
which enable generation of particular interrupts. 


Note that writes to the intrCtrl register are not pushed on the PCI frontend FIFO, so writes to intrCtrl are 
processed immediately. Since intrCtrl is not FIFO’ed, writes to intrCtrl may be processed out-of-order 
with respect to other queued writes in the PCI and memory-backed FIFOs. 


[0 | Horizontal Syne (rising edge) interrupts enable (I=enable). Defaultis0. 

12. Vertical Sync (rising edge) interrupts enable (I=enable). Default is 0. 

[3 Veertical Syne (falling edge) interrupts enable (I=enable). Defaultis0. 
D D 


PCI FIFO Full interrupt generated (1=interrupt generated). 


User Interrupt Command interrupt generated (1=interrupt generated) 
User Interrupt Command Tag. Read only 


VM interrupts enable. (l=enable). Default is 0. 
Hole counting interrupt generated (1=interrupt generated) 
VMI interrupt generated (1=interrupt generated). 


Hole counting interupts enable (1=enable). Default is 0. 


External pin pci_inta value, active low (O=PCI interrupt is active, 1=PCI interrupt is 
inactive) 


8.3 vertex and fvertex Registers 


The vertexAx, vertexAy, vertexBx, vertexBy, vertexCx, vertexCy, fvertexAx, fvertexAy, fvertexBx, 
fvertexBy, fvertexCx, and fvertexCy registers specify the x and y coordinates of a triangle to be rendered. 
There are three vertices in an Napalm triangle, with the AB and BC edges defining the minor edge and the 
AC edge defining the major edge. The diagram below illustrates two typical triangles: 
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(vertexAx, vertexAy) (vertexAx, vertexAy) 


Minor Edge Minor Edge 


Major Edge Major Edge 


(vertexBx, (vertexBx, 
vertexBy) vertexBy) 


Minor Edge Minor Edge 


(vertexCx, vertexCy) (vertexCx, vertexCy) 


The fvertex registers are floating point equivalents of the vertex registers. Napalm automatically converts 
both the fvertex and vertex registers into an internal fixed point notation used for rendering. 


vertexAx, vertexAy, vertexBx, vertexBy, vertexCx, vertexCy 


Vertex coordinate information (fixed point two’s complement 12.4 format 


fvertexAx, fvertexAy, vertex: fvertexBy, fvertexCx, fvertexCy 


i __esenipeigy 


Vertex coordinate information (IEEE 32-bit sing 


8.4 startR, startG, startB, startA, fstartR, fstartG, fstartB, and fstartA Registers 


The startR, startG, startB, startA, fstartR, fstartG, fstartB, and fstartA registers specify the starting 
color information (red, green, blue, and alpha) of a triangle to be rendered. The start registers must 
contain the color values associated with the A vertex of the triangle. The fstart registers are floating point 
equivalents of the start registers. Napalm automatically converts both the start and fstart registers into an 
internal fixed point notation used for rendering. 


startR, startG, startB, startA 


23:0 Starting Vertex-A Color information (fixed point two’s complement 12.12 format) 


fstartR, fstartG, fstartB, fstartA 


ae Description 


g Vertex-A Color information (IEEE 32-bit single- 


8.5 startZ and fstartZ registers 


The startZ and fstartZ registers specify the starting Z information of a triangle to be rendered. The startZ 
registers must contain the Z values associated with the A vertex of the triangle. The fstartZ register is a 
floating point equivalent of the startZ registers. Napalm automatically converts both the startZ and 
fstartZ registers into an internal fixed point notation used for rendering. 
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startZ 


Bit 


fstartZ 


Description 
Starting Vertex-A Z information (fixed point two’s complement 20.12 


Description 


g Vertex-A Z information (IEEE 32-bit single-precision floating point format 


8.6 startS, startT, fstartS, and fstartT Registers 


The startS, startT, fstartS, and fstartT registers specify the starting S/W and T/W texture coordinate 
information of a triangle to be rendered. The start registers must contain the texture coordinates associated 
with the A vertex of the triangle. Note that the S and T coordinates used by Napalm for rendering must be 
divided by W prior to being sent to Napalm (i.e. Napalm iterates S/W and T/W prior to perspective 
correction). During rendering, the iterated S and T coordinates are (optionally) divided by the iterated W 
parameter to perform perspective correction. The fstart registers are floating point equivalents of the start 
registers. Napalm automatically converts both the start and fstart registers into an internal fixed point 
notation used for rendering. 


startS, startT 


Starting Vertex-A Texture coordinates (fixed point two’s complement 14.18 format) 


fstartS, fstartT 


oe Description 


Starting Vertex-A Texture coordinates (IEEE 32-bit single-precision floating point 
format) 


8.7. startW and fstartW registers 


The startW and fstartW registers specify the starting 1/W information of a triangle to be rendered. The 
startW registers must contain the W values associated with the A vertex of the triangle. Note that the W 
value used by Napalm for rendering is actually the reciprocal of the 3D-geometry-calculated W value (i.e. 
Napalm iterates 1/W prior to perspective correction). During rendering, the iterated S and T coordinates 
are (optionally) divided by the iterated W parameter to perform perspective correction. The fstartW 
register is a floating point equivalent of the startW registers. Napalm automatically converts both the 
startW and fstartW registers into an internal fixed point notation used for rendering. 


startW 


Starting Vertex-A W information (fixed point two’s complement 2.30 format) 


fstartW 


Starting Vertex-A W information (IEEE 32-bit single-precision floating point format) 


8.8 dRdX, dGdX, dBdX, dAdX, fdRdX, fdGdX, fdBdX, and fdAdX Registers 


The dRdX, dGdX, dBdX, dAdX, fdRdX, fdGdX, fdBdX, and fdAdX registers specify the change in the 
color information (red, green, blue, and alpha) with respect to X of a triangle to be rendered. As a triangle 
is rendered, the d?dX registers are added to the the internal color component registers when the pixel 
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drawn moves from left-to-right, and are subtracted from the internal color component registers when the 
pixel drawn moves from right-to-left. The fd?dX registers are floating point equivalents of the d?dX 
registers. Napalm automatically converts both the d?dX and fd?dX registers into an internal fixed point 
notation used for rendering. 


dRdX, dGdX, dBdX, dAdX 
Description 
Change in color with respect to X (fixed point two’s complement 12.12 format) 


fdRdX, fdGdX, fdBdX, fdAdX 


Bit Description 
Change in color with respect to X (IEEE 32-bit single-precision floating point format) 


8.9 dZdX and fdZdX Registers 


The dZdX and fdZdX registers specify the change in Z with respect to X of a triangle to be rendered. Asa 
triangle is rendered, the dZdX register is added to the the internal Z register when the pixel drawn moves 
from left-to-right, and is subtracted from the internal Z register when the pixel drawn moves from right-to- 
left. The fdZdX registers are floating point equivalents of the dZdX registers. Napalm automatically 
converts both the dZdX and fdZdX registers into an internal fixed point notation used for rendering. 


dZdxX 
Change in Z with respect to X (fixed point two’s complement 20.12 


fdZdX 


Change in Z with respect to X (IEEE 32-bit single-precision floating point format) 


8.10 dSdX, dTdX, fdSdX, and fdTdX Registers 


The dXdX, dTdX, fdSdX, and fdTdX registers specify the change in the S/W and T/W texture coordinates 
with respect to X of a triangle to be rendered. As a triangle is rendered, the d?dX registers are added to the 
the internal S and T registers when the pixel drawn moves from left-to-right, and are subtracted from the 
internal S/W and T/W registers when the pixel drawn moves from right-to-left. Note that the delta S/W and 
T/W values used by Napalm for rendering must be divided by W prior to being sent to Napalm (i.e. 
Napalm uses AS/W and AT/W ). The d?dX registers are floating point equivalents of the fd?dX registers. 
Napalm automatically converts both the d?dX and fd?dX registers into an internal fixed point notation 
used for rendering. 


dSdX, dTdX 
Bit 
Change in S and T with respect to X (fixed point two’s complement 14.18 format) 


fdSdX, fdTdX 

Bit 
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Change in Z with respect to X (IEEE 32-bit single-precision floating point format) 


8.11 dWdX and fdWdX Registers 

The dWdX and fdWdX registers specify the change in 1/W with respect to X of a triangle to be rendered. 
As a triangle is rendered, the dWdX register is added to the the internal 1/W register when the pixel drawn 
moves from left-to-right, and is subtracted from the internal 1/W register when the pixel drawn moves from 
right-to-left. The fdWdX registers are floating point equivalents of the dWdX registers. Napalm 
automatically converts both the dWdX and fdW<«xX registers into an internal fixed point notation used for 
rendering. 


dwdx 
Description 
Change in W with respect to X (fixed point two’s complement 2.30 format 


Change in W with respect to X (IEEE 32-bit single-precision floating point format) 


8.12. dRdY, dGdY, dBdY, dAdY, fdRdY, fdGdY, fdBdY, and fdAdY Registers 


The dRdY, dGdY, dBdY, dAdY, fdRdY, fdGdY, fdBdY, and fdAdY registers specify the change in the 
color information (red, green, blue, and alpha) with respect to Y of a triangle to be rendered. As a triangle 
is rendered, the d?dY registers are added to the the internal color component registers when the pixel 
drawn in a positive Y direction, and are subtracted from the internal color component registers when the 
pixel drawn moves in a negative Y direction. The fd?dY registers are floating point equivalents of the d? 
dY registers. Napalm automatically converts both the d?dY and fd?dY registers into an internal fixed 
point notation used for rendering. 


dRdY, dGdY, dBdY, dAdY 
Bit 
Change in color with respect to Y (fixed point two’s complement 12.12 format) 


fdRdY, fdGdY, fdBdY, fdAdY 


Change in color with respect to Y (IEEE 32-bit single-precision floating point format) 


8.13. dZdY and fdZdY Registers 


The dZdY and fdZdY registers specify the change in Z with respect to Y of a triangle to be rendered. Asa 
triangle is rendered, the dZdY register is added to the the internal Z register when the pixel drawn moves 
in a positive Y direction, and is subtracted from the internal Z register when the pixel drawn moves in a 
negative Y direction. The fdZdY registers are floating point equivalents of the dZdY registers. Napalm 
automatically converts both the dZdY and fdZdY registers into an internal fixed point notation used for 
rendering. 


dZdY 
Bit 
Change in Z with respect to Y (fixed point two’s complement 20.12 format) 


fdZdY 

Bit 
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Change in Z with respect to Y (IEEE 32-bit single-precision floating point format) 


8.14 dSdY, dTdY, fdSdY, and fdTdY Registers 


The dYdY, dTdY, fdSdY, and fdTdY registers specify the change in the S/W and T/W texture coordinates 
with respect to Y of a triangle to be rendered. As a triangle is rendered, the d?dY registers are added to the 
the internal S/W and T/W registers when the pixel drawn moves in a positive Y direction, and are 
subtracted from the internal S/W and T/W registers when the pixel drawn moves in a negative Y direction. 
Note that the delta S/W and T/W values used by Napalm for rendering must be divided by W prior to being 
sent to Napalm (i.e. Napalm uses AS/W and AT/W ). The d?dY registers are floating point equivalents of 
the fd?dY registers. Napalm automatically converts both the d?dY and fd?dY registers into an internal 
fixed point notation used for rendering. 


dSdY, dTdY 
Bit 
Change in S and T with respect to Y (fixed point two’s complement 14.18 format) 


fdSdY, fdTdY 


Bit 


Change in Z with respect to Y (IEEE 32-bit single-precision floating point format) 


8.15 dWdY and fdWdY Registers 


The dW@dY and fdWdY registers specify the change in 1/W with respect to Y of a triangle to be rendered. 
As a triangle is rendered, the dW@dY register is added to the the internal 1/W register when the pixel drawn 
moves in a positive Y direction, and is subtracted from the internal 1/W register when the pixel drawn 
moves in a negative Y direction. The fdW@Y registers are floating point equivalents of the dWdY 
registers. Napalm automatically converts both the dWdY and fdW@Y registers into an internal fixed point 
notation used for rendering. 


dwdy 


Bit 


Change in W with respect to Y (fixed point two’s complement 2.30 format) 


fdWdY 


Change in W with respect to Y (IEEE 32-bit single-precision floating point format) 


8.16  triangleCMD and ftriangleCMD Registers 


The triangleCMD and ftriangleCMD registers execute the triangle drawing command. Writes to 
triangleCMD or ftriangleCMD initiate rendering a triangle defined by the vertex, start, d?dX, and d?dY 
registers. Note that the vertex, start, d?dX, and d?dY registers must be setup prior to writing to 
triangleCMD or ftriangleCMD. The value stored to triangleCMD or ftriangleCMD is the area of the 
triangle being rendered -- this value determines whether a triangle is clockwise or counter-clockwise 
geometrically. If bit(31)=0, then the triangle is oriented in a counter-clockwise orientation (i.e. positive 
area). If bit(31)=1, then the triangle is oriented in a clockwise orientation (i.e. negative area). To calculate 
the area of a triangle, the following steps are performed: 


1. The vertices (A, B, and C) are sorted by the Y coordinate in order of increasing Y (i.e. A.y <= B.y 
<= Cy) 
2. The area is calculated as follows: 


Copyright © 1996-1999 3dfx Interactive, Inc. Revision 1.13 
3dfx Confidential 95 Printed 
10/24/2019 


For Internal Use Only 


3 0" Napalm Graphics Engine 
AREA = ((dxAB * dyBC) - (dxBC * dyAB)) /2 
where 
dxAB = A.x - B.x 
dyBC = B.y - C.y 
dxBC = B.x - C.x 
dyAB=A,y - By 


Note that Napalm only requires the sign bit of the area to be stored in the triangleCMD and 
ftriangleCMD registers -- bits(30:0) written to triangleCMD and ftriangleCMD are ignored. 


triangleCMD 
Bit Description 


Sign of the area of the triangle to be rendered 


ftriangleCMD 


Sign of the area of the triangle to be rendered (IEEE 32-bit single-precision floating point 
format) 


8.17 nopCMD Register 


Writing any data to the nopCMD register executes the NOP command. Executing a NOP command 
flushes the graphics pipeline. The Isb of the data value written to nopCMD is used to optionally clear the 
fbiPixelsIn, fbiChromaFail, fbiZfuncFail, fbiAfuncFail, fbiPixelsOut, and fbiStenciltestFail registers. 
Writing a ‘1’ to the Isb of nopCMD will clear the aforementioned registers. eu a ‘0’ to the Isb of 


Sareea will not site the values of the aforementioned registers. 


Clear fbiPixelsIn, fbiChromaFail, fbiZfuncFail, fbiAfuncFail, fbiPixelsOut and 


registers (1=clear re 


8.18  fastfillCMD Register 


Writing any data to the fastfill register executes the FASTFILL command. The FASTFILL command is 
used to clear the RGB and depth buffers as quickly as possible. Prior to executing the FASTFILL 
command, the clipLeftRight and clipTopBottom are loaded with a rectangular area which is the desired 
area to be cleared. Note that clip registers define a rectangular area which is inclusive of the clipLeft and 
clipTop register values, but exclusive of the clipRight and clipBottom register values. The fastfillCMD 
register is then written to initiate the FASTFILL command after the clip registers have been loaded. 


ihe RGM clr epsied nell (20 an an opal cles dob ih doi va 
When ming in 15 BPP rendering mode, the -it alpha value stored int the ame uf caused as 
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When using SGRAM, fastfillCMDJ[0] overrides fhzMode[8], and forces dithering off, allowing the color 
plane to be filled using SGRAM blockwrites. When using SDRAM, dithering behavior is determined 
solely by fbzMode[8]. 


Disable dithering during fastfill (1 = disable dithering). 


8.19 swapbufferCMD Register 


Writing to the swapbufferCMD register executes the SWAPBUFFER command. If the data written to 
swapbufferCMD bit(0)=0, then the frame buffer swapping is not synchronized with vertical retrace. If 
frame buffer swapping is not synchronized with vertical retrace, then visible frame “tearing” may occur. If 
swapbufferCMD bit(0)=1 then the frame buffer swapping is synchronized with vertical retrace. 
Synchronizing frame buffer swapping with vertical retrace eliminates the aforementioned frame “tearing.” 
When a swapbufferCMD is received in the front-end PCI host FIFO, the swap buffers pending field in the 
status register is incremented. Conversely, when an actual frame buffer swapping occurs, the swap buffers 
pending field in the status register (bits(30:28)) is decremented. The swap buffers pending field allows 
software to determine how many SWAPBUFFER commands are present in the Napalm FIFOs. Bits(8:1) 
of swapbufferCMD are used to specify the number of vertical retraces to wait before swapping the color 
buffers. An internal counter is incremented whenever a vertical retrace occurs, and the color buffers are 
not swapped until the internal vertical retrace counter is greater than the value of swapbufferCMD 
bits(8:1) -- After a swap occurs, the internal vertical retrace counter is cleared. 
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Setting swabufferCMD[0]=1 is used to maintain constant frame rate. NOTE: for highest performance 
when syncing-to-vsync, set the swapbuffer interval (swapbufferCMD bits(8:1)) to zero. 


Note that if vertical retrace synchronization is disabled for swapping buffers (swapbufferCMD(0)=0), then 
the swap buffer interval field is ignored. The swapbufferCMD on Napalm works similar to Voodoo Rush. 
The driver must write to the swapbufferPend register to increase the outstanding swap count, then write to 
the swapbufferCMD register. 


To enable triple buffering, turn on the appropriate bit in dram_init_1. If triple buffering is enabled, then the 
graphics core will be allowed to continue given that one or fewer swaps is pending to be done by the video 
unit. Effectively, this allows Napalm to render up to two frames ahead of the displayed buffer. 


lOsiSY Synchronize frame buffer swapping to vertical retrace (1=enable 


Swap buffer interval 


8.20 fbzColorPath Register 


The fbzColorPath register controls the color and alpha rendering pixel pipelines. Bits in fbzColorPath 
control color/alpha selection and lighting. Individual bits of fbzColorPath are set to enable modulation, 
addition, etc. for various lighting effects including diffuse and specular highlights. 


RGB Select (0=Iterated RGB, 1=TREX Color Output, 2=Color1 RGB, 3=Reserved). 
2 


3: Select (O=Iterated A, 1=TREX Alpha Output, 2=Color1 Alpha, 3=Reserved). 


RGB). 
Alpha Combine Unit control (cca_localselect mux control: 0=iterated alpha, 1=Color0 
alpha, 2=iterated Z, 3=clamped iterated W). Only used if combineMode[31]=0. 
(so 
Ca 


Color Combine Unit control (cc_localselect_override mux control: 0=cc_localselect, 
1=Texture alpha bit(7)) 


Color Combine Unit control (cc_zero_other mux control: 0=c_other, 1=zero) 
Color Combine Unit control (cc sub clocal mux control: 0=zero, 1=c_ local 


12:10 Color Combine Unit control (cc_mselect mux control: 0=zero, 1=c_local, 2=a_other, 
Pe 3=a_local, 4=texture alpha, 5=texture rgb, ) 

7 : , Fer « 

18 ; : 


117. | Alpha Combine Unit control (cca_zero_other mux control: 0=a_other, 1=zero) 
118 | Alpha Combine Unit control (cca_sub_clocal mux control: 0=zero, 1=a_local) 
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21:19 Alpha Combine Unit control (cca_mselect mux control: 0=zero, 1=a_local, 2=a_other, 
3=a_local, 4=texture alpha, ) 


Alpha Combine Unit control (cca_reverse_blend control) 
Alpha Combine Unit control (cca_add_clocal control) 
Alpha Combine Unit control (cca_add_alocal control) 


Alpha Combine Unit control (cca_invert_output control) 


Parameter Adjust (1=adjust parameters for subpixel correction) 
Enable Texture Mapping (1=enable) 
Enable RGBA, Z, and W parameter clamping (1=enable) 


Note that the color channels are controlled separately from the alpha channel. There are two primary color 


selection units: the Color Combine Unit(CCU) and the ae Combine Unit (ACU). ae 
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Avenger+ Datapath 
- Color Combine Unit - 


a 
Bones s Q2rnded Q,8s 
QoOR vase OReas oss 

oL2f. Og a. = 
aeveaag& Zo eas Z2Ooze 
on Sg eaetae “ss 
Set eEpes peogse ey S 
seas 2eu 22sgs2Es8 2225 
ES SMegs BSSESE ESES 
28 e528 So 2eS28 bon 2328 


01234567 \0 1234567 
cc_otherselect[2:0] cc [REEEELY [2:0] cc ee ae 7[2:0] 


c_ other c_local c_mselect 
Chroma key a 7 _mselect_7 
Check 8 0.0.8 (format= {sign.int.frac}) 8.0.0.8 
0 


8 0.0.8 
0 


P=! 8 cc _localsel calculations: 
9) = us) [0]=(cc_localselect_override) ? 
a) = oo texture alpha[0] : cc_localselect{0] 
ao 00 Don 00 = 3 [I]-!cc_localselect_override & ce_localselect[ 1] 
No x (00) .0-x. (00) aa as [2]-!cc_localselect_override & cc_localselect[2] 
2 8 0.0-x (01) x (01) Bg 2 

5 0.ff-x (10) 0.ff-x (10) 2 2s 

| | 2Bsee 

5 | S828 

vo 1) 1 

ba > o aatPo 

=| Sg 

2 a} 

i>) 

Ss) 3 


012345 6 fcc mselect[2:0] 
= 8 0.0.8 fr eevee lene 
10 signed x N ) 


9 unsigned 
multiply vy 
Trunc. LSBs 
No Round 
10 1.1.8 
S 
2 
Ss UO 
83 
coal’ 9 0.1.8 (max value = 1.00) 
\o13 {cc_add_clocal, cc_add_alocal} 
cc_invert_add_local 
8 0.0.8 
11: 1.2.8 
Modulate 1x, 2x, 4x/—cc_outshift[1:0] 
13 1.4.8 
Clamp 0-FF 
cc_invert_output 8 0.0.8 
8 0.0.8 
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- Alpha Combine Unit - 
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g2 5a BREE 
#82883 238222000 
\o 1.23 / 01234567, 
cca_otherselect[ 1:0] cca_localselect [2:0] 
Alpha Mask a_other a_local 
Check 8 0.0.8 (format= {sign.int.frac}) 8.0.0.8 
0 0 
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g 3 
= 012345 6 cca mselect[2:0] 
8 0.0.8 cca_reverse_blend 
10 signed x Ne 
9 unsigned 
multiply \ 
Trunc. LSBs 
No Round 
10 1.1.8 
s 
a 
_= 
Ss oO 
g 8 
oa 9 0.1.8 
\O 13 Y {cca_add_clocal, cca_add_alocal} 
cca_invert_add_local 
8.0.0.8 
11 1.2.8 
Modulate 1x, 2x, 4x-—cca_outshift[ 1:0] 
13 148 
Clamp 0-FF 
cca_invert_output 80.0.8 
8 0.0.8 
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Bit(26) of fobzColorPath enables subpixel correction for all parameters. When enabled, Napalm will 
automatically subpixel correct the incoming color, depth, and texture coordinate parameters for triangles 
not aligned on integer spatial boundaries. Enabling subpixel correction decreases the on-chip triangle setup 
performance from 7 clocks to 16 clocks, but as the triangle setup engine is separately pipelined from the 
triangle rasterization engine, little if any performance penalty is seen when subpixel correction is enabled. 


Important Note: When subpixel correction is enabled, the correction is performed on the start registers as 
they are passed into the triangle setup unit from the PCI FIFO. As a result, the host must pass down new 
starting parameter information for each new triangle -- if new starting parameter information is not passed 
down for a new triangle, the starting parameters will be subpixel corrected starting with the start registers 
already subpixel corrected for the last rendered triangle [in effect the parameters will be subpixel corrected 
twice, resulting in inaccuracies in the starting parameter values]. 


Bit(27) of fbzColorPath is used to enable texture mapping. If texture-mapped rendering is desired, then 
bit(27) of fbzColorPath must be set. When bit(27)=1, then data is transfered from TREX to FBI. If 
texture mapping is not desired (i.e. Gouraud shading, flat shading, etc.), then bit(27) may be cleared and no 
data is transfered from TREX to FBI. 


Bit(28) of fbzColorpath is used to enable RGBA, Z, and W parameter clamping. When fbzColorpath 
bit(28)=1, then the RGBA triangle parameters are be clamped to [0,0xff] inclusive during triangle 
rasterization. Note that fbzColorpath bit(28) has no effect on the RGBA triangle parameters during 
triangle setup or sub-pixel correction. When fbzColorpath bit(28)=0, then the RGBA parameters are 
allowed to wrap according to the following formula: 


if (rgbalIterator[23:12] == Oxfff) 
rgbaClamped[7:0] = 0x0; 

else if (rgbaIterator[23:12] == 0x100) 
rgbaClamped[7:0] = Oxff; 

else 
rgbaClamped[7:0] = rgbaIterator[19:12]; 


When fbzColorpath bit(28)=1, then the Z triangle parameter is clamped to [0,0xffff] inclusive during 
triangle rasterization. Note that fbzColorpath bit(28) has no effect on the Z triangle parameter during 
triangle setup or sub-pixel correction. Note also that the unclamped Z triangle iterator is used when 
performing floating point Z-buffering (fbzMode bit(21)=1). When fbzColorpath bit(28)=0, then the Z 
parameter is allowed to wrap according to the following formula: 


if (zIterator[31:12] == OxffffFf) 
zClamped[15:0] = 0x0; 
else if (zIterator[31:12] == 0x10000) 


zClamped[15:0] = Oxffff; 


else 
zClamped[15:0] = zIterator[27:12]; 


When fbzColorpath bit(28)=1, then the W triangle parameter is clamped to [0,0xff] inclusive for use in 
the Alpha Combine Unit and the fog unit. Note that fbzColorpath bit(28) has no effect on the W triangle 
parameter during triangle setup or sub-pixel correction. Note also that the unclamped W triangle iterator is 
used when performing floating point W-buffering (fbzMode bit(21)=0). When fbzColorpath bit(28)=0, 
then the W parameter used as inputs to the ACU and fog units is allowed to wrap according to the 
following formula: 
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if (wIterator[47:32] == Oxffff) 
wClamped[7:0] = 0x0; 

else if (zIterator[47:32] == 0x0100) 
wClamped[7:0] = Oxff; 

else 
wClamped[7:0] = wIterator[39:32]; 


fbzColorpath bits(3 1:30) control the column banding selection for the triangle iterators. FbzColorPath 
bits(3 1:30) can be changed to optimize performance for a given application. 


8.21 combineMode Register 
The combineMode register, along with the fbzColorPath and textureMode registers, controls the color 


and alpha rendering and texture pixel pipelines. Note that the chip field is used to direct writes to each 
individual combine unit. 


RGB-channel Combine Unit Control (cc_outshift) (0=no modulate, 1=modulate by 2, 
2=modulate by 4, 3=reserved) 


Alpha-channel Combine Unit Control (cca_otherselect) 
Alpha-channel Combine Unit Control (cca_localselect) 


Alpha-channel Combine Unit Control (cca_invert_other) 
: Alpha-channel Combine Unit Control (cca_invert_local) 
Alpha-channel Combine Unit Control (cca_invert_add_local) 


Alpha-channel Combine Unit Control (cca_outshift) (O=no modulate, 1=modulate by 2, 
2=modulate by 4, 3=reserved 


Enable 2 pixel-per-clock rendering operation (1=enable) 


30 Disable texture chroma substitution (1=use chromaRange register for constant colors 
into the texture blending units) 


31 Color Combine Mode MUX selection control (0=use fbzColorPath, 1=use 
combineMode) 


The datapath control names (e.g. cc_otherselect, cca_localselect) are given for the Color Combine Unit and 
Alpha Combine Units. When the chip field directs a write to the combineMode register in a texture unit, 
the datapath controls are the texture datapath controls (e.g. tc_otherselect, tca_localselect). 


When chromaRange[30:29]=0x3 and combineMode[30]=0, then texture chroma substitution is enabled. 
When combineMode[30]=1, then the chromaRange register is used to store constant color values in the 
texture units for texture blending. The chip fields are used to store different constant color values into each 
texture unit. 


For the Color Combine and Alpha Combine Units (FBI combine units), combineMode bit(31) controls 
whether the controls for the “other,” “local,” and “mselect_7” MUX’s come from fbzColorPath or 
combineMode as follows: 
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MUX CombineMode bit(31)=0 combineMode bit(31)=1 
Control 


{0, fbzColorPath{1:0]} combineMode[2:0] 
{00, fbzColorPath[4] combineMode[5:3] 


{000} combineMode[8:6] 
fbzColorPath[3:2] combineMode[17:16] 
{1’b0, fbzColorPath[6:5]} combineMode[20:18] 


For the Texture Combine and Texture Alpha Combine units (in each TMU), the “other,” “local” and 
“mselect_7” (for the Texture Combine unit) always come from the combineMode register. 


CombineMode bit(29) is used to enable 2 pixel-per-clock rendering. 2 pixel-per-clock rendering is only 
allowed to be enabled when a single texture is being applied per triangle. CombineMode bit(29) must be 
disabled when dual-texturing is being utilized. When 2 pixel-per-clock rendering is enabled 
(combineMode bit(29)=1), renderMode bits(24:22) contain the log2 of the number of scanlines rendered 
by each texture unit. RenderMode bits(24:22) can be changed to optimize performance for a given 
application. Also note that when 2 pixel-per-clock rendering is enabled, writes to either texture unit will 
be received by both. In other words, the ability to selectively write state and triangle information to 
individual texture units is disabled when 2 pixerl-per-clock rendering is enabled, as writes to either texture 
unit are received by both units. This functionality can be disabled by setting miscInit1[18]. 


*** Important Note: Due to a bug in the Napalm hardware, when switching from 2 pixel-per-clock 
rendering to single pixel-per-clock rendering, the TMU units must be idled. To accomplish this, send 
down at least 12 NOP commands with the chip field set to select both TMUs prior to the write to 
combineMode register clearing bit(29). This will flush the TMU pipelines (12 NOPs are necessary due to 
the pipelining between the FBI and TMU chips) before single pixel-per-clock rendering is selected by 
clearing combineMode bit(29). Note that the chip field for the NOP commands send to idle the TMUs 
should not select the FBI chip (i.e. set chipField=0x6), as this will cause the entire pixel pipeline to be 
flushed, resulting in unnecessary performance loss. Also note that sending 12 NOP commands prior to the 
combineMode write is not necessary when changing from single pixel-per-clock rendering mode to 2 
pixel-per-clock rendering mode. 


8.22 fogMode Register 
The fogMode register is used to control the fog, alpha-blending, and dithering functionality of Napalm. 


fi 
2 
Fog Unit control (fogz control) 
Fo gconstant control: 0=fog multiplier output, 1=fog 


1=[(Dold - B) alphaBlendOP (S - a)] 
0 Alpha Channel alpha blending order of operations (0=[(S - a) alphaBlendOP (Dold - B)], 
1=[(Dold - B) alphaBlendOP (S - a)] 


g 

i 

ea RGB Channel alpha blending order of operations (0=[(S - a) alphaBlendOP (Dold - B)], 
i; 

fu 


phaBlendOP) (0=add, 1=subtract) 
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The diagram below shows the fog unit of Napalm: 


Color Channel 
(from Color fooColor iterated w 
Combine Unit) 8 (4. 12 floating point) mantissa(9:2) 
8 
0 


6 {4 bits exponent, 
mantissa(11:10)} 


6 {4 bits exponent, 
mantissa(11:10)} 


0 
fogmult 
\0 1/ fogadd 
ey: C | 64x8 RAM 64x8 RAM 


fogenable (fog alpha) (fog delta alpha) 


(6.2 format) 8 


2’s Comp 


(6.0 format) 6 8 (.8 format) 


8 8 unsigned x 
8 i 6 unsigned 
multiply 
10 (6.4 format) . . 
Dither Matrix 
bit(3)=y [0] xor x [0] 
] 
] xor x [1] 
\o 1 / 
fogzones (0.4 format) 4 
4 (0.4 format) 
(7.0 format) 7 
1 (carry-out) 
fogdither 
fog table alpha 
iterated alpha 
iterated Z(27:20), clamped 
iterated W(39:32), clamped 
9 signed x \o 12 3f {fogz, fogalpha} 
9 unsigned 8 ul 
multiply \~/ 
fogColor 9 (1.8 format) 
\ 1 of fogconstant 
fogenable 
8 
Clamp FF 
8 Color before fog 8 Fogged Color 
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Bit(0) of fogMode is used to enable fog and atmospheric effects. When fog is enabled, the fog color 
specified in the fogColor register is blended with the source pixels as a function of the fogTable values 
and iterated W. Napalm supports a 64-entry lookup table (fogTable) to support atmospheric effects such 
as fog and haze. When enabled, the MSBs of a normalized floating point representation of (1/W) is used to 
index into the 64-entry fog table. The ouput of the lookup table is an “alpha” value which represents the 
level of blending to be performed between the static fog/haze color and the incoming pixel color. 8 lower 
order bits of the floating point (1/W) are used to blend between multiple entries of the lookup table to 
reduce fog “banding.” The fog lookup table is loaded by the Host CPU, so various fog equations, colors, 
and effects can be supported. 


The following table shows the mathematical equations for the supported values of bits(2:1) of fogMode 
when bits(5:3)=0: 


Bit(0) - Enable | Bit(1) - fogadd | Bit(2)-fogmult | Fog Equation 
Fog mux control mux control 


fo Fiore Pigored Cour Gin 


oa 
a0 Tot = fg) Cin 
Cout = 


where: 
Cout = Color output from Fog block 
Cin = Color input from Color Combine Unit Module 
Cfog = fogColor register 
AFog = alpha value calculated from Fog table 


When bit(3) of fogMode is set, the integer part of the iterated alpha component is used as the fog alpha 
instead of the calculated fog alpha value from the fog table. When bit(4) of fogMode is set, the upper 8 
bits of the iterated Z component are used as the fog alpha instead of the calculated fog alpha value from the 
fog table. If both bit(3) and bit(4) are set, then bit(4) takes precedence, and the upper 8 bits of the iterated 
Z component are used for the fog alpha value. Bit(5) of fogMode takes precedence over bits(4:3) and 
enables a constant value(fogColor) to be added to incoming source color. 


8.23 alphaMode Register 
The alphaMode register controls the alpha blending and anti-aliasing functionality of Napalm. 


}O Sd Enable alpha function(I=enabley 
[75 reserved 


Source RGB alpha blending factor (see table below) 
15:12 Destination RGB alpha blending factor (see table below) 


19:16 Source alpha-channel alpha blending factor (see table below) 

23:20 Destination alpha-channel alpha blending factor (see table below) 
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Alpha reference value 


Bits(3:1) specify the alpha function during rendering operations. The alpha function and test pipeline is 
shown below: 


Alpha from Alpha 
Combine Unit 


alphaMode(31:24) 


afunc_eq 


Alpha Test 
enable 


Alpha test pass 


When alphaMode bit(0)=1, an alpha comparison is performed between the incoming source alpha and 
bits(3 1:24) of alphaMode. Section 5.18.1 below further describes the alpha function algorithm. 


Bit(4) of alphaMode enables alpha blending. When alpha blending is enabled, the blending function is 
performed to combine the source color with the destination pixel. The blending factors of the source and 
destinations pixels are individually programmable, as determined by bits(23:8). Note that the RGB and 
alpha color channels may have different alpha blending factors. Section 5.18.2 below further describes 
alpha blending. 


Bit(5) of alphaMode is reserved. 


8.23.1 Alpha function 


When the alpha function is enabled (alphaMode bit(0)=1), the following alpha comparison is performed: 
AlphaSrc AlphaOP AlphaRef 

where Al/phaSrc represents the alpha value of the incoming source pixel, and Al/phaRef is the value of 

bits(31:24) of alphaMode. A source pixel is written into an RGB buffer if the alpha comparison is true 

and writing into the RGB buffer is enabled (fbzMode bit(9)=1. If the alpha function is enabled and the 


alpha comparison is false, the fbiAfuncFail register is incremented and the pixel is invalidated in the pixel 
pipeline and no drawing occurs to the ARE he er or buffers. The supported alpha 


comparison functions (AlphaOPs) are shown below: 


| Value AlphaOP Function 
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Never 
less than 
Equal 
less than or equal 


greater than 

not equal 

greater than or equal 
Always 


8.23.2 Alpha Blending 


When alpha blending is enabled (alphaMode bit(4)=1), incoming source pixels are blended with 
destination pixels. SRSTERENS GI litle tii BUTS an stained an RE ITONTSS. 
Onder ogMof8) | Operation (pM) 

1 a 


where 
Dnew The new destination pixel being written into the frame buffer 
S The new source pixel being generated 
Dold The old (current) destination pixel about to be modified 
or The source pixel alpha blending function. 
B The destination pixel alpha blending function. 


The alpha blending function for the alpha component is as follows: 


where 
Anew — The new destination alpha being written into the alpha buffer 
AS The new source alpha being generated 
Aold The old (current) destination alpha about to be modified 
ad The source alpha alpha-blending function. 


Bd The destination alpha alpha-blending function. 


Note that the source and destination pixels may have different associated alpha blending functions. Also 
note that RGB color components and the alpha components may have different associated alpha blending 
functions. The alpha blending factors of the RGB color components are defined in bits(15:8) of 
alphaMode, while the alpha blending factors of the alpha component is specified in bits(23:16) of 
EsGeEael alpha blending functions supported: 


alphaMode. The following table lists the 
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pox2 PA COLOR Color 
$f 


}0x4—s—s—sSSsSYt [AONE —C—C‘CS }One i —“s—s——C‘izCY 


—— AOMSRC_ALPHA 1 - Source {Source alpha __ 


[ox6. —S™*~é‘“~*~*™SCSC CYL AOM COLOR fi-Golor—CSCS 
POx7 | AOMDST_ALPHA 1 - Destination alpha 


Oxa-Oxe pO Reserved 
Oxf (source alpha blending function) ASATURATE MIN(Source alpha, | - Destination alpha) 
Oxf (destination alpha blending function) | A _COLORBEFOREFOG Color before Fog Unit 


When the value 0x2 is selected as the destination alpha blending factor (A_COLOR function), the source 
pixel color is used as the destination blending factor. When the value 0x2 is selected as the source alpha 
blending factor, the destination pixel color is used as the source blending factor. 


Note also that the alpha blending function Oxf 
is different depending upon whether it is being used as a source or destination alpha blending function. 
When the value Oxf is selected as the destination alpha blending factor, the source color before the fog unit 
(“unfogged” color) is used as the destination blending factor -- this alpha blending function is useful for 
multi-pass rendering with atmospheric effects. When the value Oxf is selected as the source alpha blending 
factor, the alpha-saturate anti-aliasing algorithm is selected -- this MIN function performs polygonal anti- 
aliasing for polygons which are drawn front-to-back. 


15/16 BPP alpha channel alpha blending modes 
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8.24 IfbMode Register 
The IfbMode register controls linear frame buffer accesses and queued VMI host port accesses. 


Description 

Linear frame buffer write format (see table below) 

Reserved 

Reserved 

Enable Napalm pixel pipeline-processed linear frame buffer writes (1=enable) 
Linear frame buffer RGBA lanes (see tables below 


16-bit word swap linear frame buffer writes (1=enable 
Byte swizzle linear frame buffer writes (1=enable) 
LFB access Y origin (0=top of screen is origin, 1=bottom of screen is origin) 


Linear frame buffer write access W select (O=LFB selected, 1= /——- |] 


15 Reserved 


The following table shows the supported Napalm linear frame buffer write formats: 


| | AO -bit formats 
}O | N6-bit RGB (5-6-5) 
| 


24-bit RGB (x-8-8-8) 


| 
32-bit ARGB (8-8-8-8) 


[Reserved 
| a 
[9 | Queued VMI host port write 


When accessing the linear frame buffer, the cpu accesses information from the starting linear frame buffer 
(LFB) address space (see section 4 on Napalm address space) plus an offset which determines the <x,y> 
coordinates being accessed. Bits(3:0) of lfbMode define the format of linear frame buffer writes. 


6 Reserved 


When writing to the linear frame buffer, IfbMode bit(8)=1 specifies that LFB pixels are processed by the 
normal Napalm pixel pipeline -- this implies each pixel written must have an associated depth and alpha 
value, and is also subject to the fog mode, alpha function, etc. If bit(8)=0, pixels written using LFB access 
bypass the normal Napalm pixel pipeline and are written to the specified buffer unconditionally and the 
values written are unconditionally written into the color/depth buffers except for optional color dithering 
[depth function, alpha blending, alpha test, and color/depth write masks are all bypassed when bit(8)=0]. If 
bit(8)=0, then only the buffers that are specified in the particular LFB format are updated. Also note that if 
IfbMode bit(8)=0 that the color and Z mask bits in fozMode(bits 9 and 10) are ignored for LFB writes. 
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For example, if LFB modes 0-2, or 4 are used and bit(8)=0, then only the color buffers are updated for LFB 
writes (the depth buffer is unaffected by all LFB writes for these modes, regardless of the status of the Z- 
mask bit fbzMode bit 10). However, if LFB modes 12-14 are used and bit(8)=0, then both the color and 
depth buffers are updated with the LFB write data, irrespective of the color and Z mask bits in fbzMode. 

If LFB mode 15 is used and bit(8)=0, then only the depth buffer is updated for LFB writes (the color 
buffers are unaffected by all LFB writes in this mode, regardless of the status of the color mask bits in 
fbzMode). 


If IfbMode bit(8)=0 and a LFB write format is selected which contains an alpha component (formats 2, 5, 
and 14) and the alpha buffer is enabled, then the alpha component is written into the alpha buffer. 
Conversely, if the alpha buffer is not enabled, then the alpha component of LFB writes using formats 2, 5, 
and 14 when bit(8)=0 are ignored. Note that anytime LFB formats 2, 5, and 14 are used when bit(8)=0 that 
blending and/or chroma-keying using the alpha component is not performed since the pixel-pipeline is 
bypassed when bit(8)=0. 


If lfbMode bit(8)=0 and LFB write format 14 is used, the component that is ignored is determined by 
whether the alpha buffer is enabled -- If the alpha buffer is enabled and LFB write format 14 is used with 
bit(8)=0, then the depth component is ignored for all LFB writes. Conversely, if the alpha buffer is 
disabled and LFB write format is used with bit(8)=0, then the alpha component is ignored for all LFB 
writes. 


If IfbMode bit(8)=1 and a LFB write access format does not include depth or alpha information (formats 0- 
5), then the appropriate depth and/or alpha information for each pixel written is taken from the zaColor 
register. Note that if bit(8)=1 that the LFB write pixels are processed by the normal Napalm pixel pipeline 
and thus are subject to the per-pixel operations including clipping, dithering, alpha-blending, alpha-testing, 
depth-testing, chroma-keying, fogging, and color/depth write masking. 


Bits(10:9) of IfbMode specify the RGB channel format (color lanes) for linear frame buffer writes. The 
table below shows the Napalm supported RGB lanes: 


RGB Channel Format 
f0 | ARGB 


ABGR 
RGBA 


Bit(11) of lfbMode defines the format of 2 16-bit data types passed with a single 32-bit writes. For linear 
frame buffer formats 0-2, two 16-bit data transfers can be packed into one 32-bit write -- bit(11) defines 
which 16-bit shorts correspond to which pixels on screen. The table below shows the pixel packing for 
packed 32-bit linear frame buffer formats 0-2: 


IfbMode bit(11) Screen Pixel Packing 
10 | Right Pixel(host data 31:16), Left Pixel(host data 15:0) 
Left Pixel(host data 31:16), Right Pixel(host data 15:0) 


For linear frame buffer formats 12-14, bit(11) of IfbMode defines the bit locations of the 2 16-bit data 
types passed. The table below shows the data packing for 32-bit linear frame buffer formats 12-14: 


LfbMode bit(11) Screen Pixel Packing 


Z value(host data 31:16), RGB value(host data 15:0) 
RGB value(host data 31:16), Z value(host data 15:0) 
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For linear frame buffer format 15, bit(11) of IfbMode defines the bit locations of the 2 16-bit depth values 
passed. The table below shows the data packing for 32-bit linear frame buffer format 15: 


LfbMode bit(11) Screen Pixel Packing 


Cee Z Right Pixel(host data 31:16), Z Left Pixel(host data 15:0) 
Z left Pixel(host data 31:16), Z Right Pixel(host data 15:0) 


Note that bit(11) of IfbMode is ignored for linear frame buffer writes using formats 4,5or8. 


Bit(12) of lfbMode is used to enable byte swizzling. When byte swizzling is enabled, the 4-bytes within a 
32-bit word are swizzled to correct for endian differences between Napalm and the host CPU. For little 
endian CPUs (e.g. Intel x86 processors) byte swizzling should not be enabled, however big endian CPUs 
(e.g. PowerPC processors) should enable byte swizzling. For linear frame buffer writes, the bytes within a 
word are swizzled prior to being modified by the other control bits of lfbMode. When byte swizzling is 
enabled, bits(31:24) are swapped with bits(7:0), and bits(23:16) are swapped with bits(15:8). 


Very Important Note: The order of swapping and swizzling operations for LFB writes is as follows: byte 
swizzling is performed first on all incoming LFB data, as defined by IfbMode bit(12) and irrespective of 
the LFB data format. After byte swizzling, 16-bit word swapping is performed as defined by IfbMode 
bit(11). Note that 16-bit word swapping is never performed on LFB data when data formats 4 and 5 are 
used. Also note that 16-bit word swapping is performed on the LFB data that was previously optionally 
swapped. Finally, after both swizzling and 16-bit word swapping are performed, the individual color 
channels are selected as defined in IfbMode bits(10:9). Note that the color channels are selected on the 
LFB data that was previously swizzled and/or swapped 


Bit(13) of IfbMode is used to define the origin of the Y coordinate for all linear frame buffer writes when 
the pixel pipeline is bypassed (IfbMode bit(8)=0). Note that bit(13) of IfbMode does not affect rendering 
operations (FASTFILL and TRIANGLE commands) -- bit(17) of fbzMode defines the origin of the Y 
coordinate for rendering operations. Note also that if the pixel pipeline is enabled for linear frame buffer 
writes (IfbMode bit(8)=1), then fbzMode bit(17) is used to determine the location of the Y origin. When 
cleared, the Y origin (Y=0) for all linear frame buffer accesses is defined to be at the top of the screen. 
When bit(13) is set, the Y origin for all linear frame buffer accesses is defined to be at the bottom of the 
screen. 


Bit(14) of IfbMode is used to select the W component used for LFB writes processed through the pixel 
pipeline. If bit(14)=0, then the MSBs of the fractional component of the 48-bit W value passed to the pixel 
pipeline for LFB writes through the pixel pipeline is the Z value associated with the LFB write. [Note that 
the Z value associated with the LFB write is dependent on the LFB format, and is either passed down pixel- 
by-pixel from the CPU, or is set to the constant zaColor]. If bit(14)=1, then the MSBs of the fractional 
component of the 48-bit W value passed to the pixel pipeline for LFB writes is zacolor(23:0). Regardless 
of the setting of bit(14), when LFB writes go through the pixel pipeline, all other bits except the 16 MSBs 
of the fractional component of the W value are set to 0x0. Note that bit(14) is ignored if LFB writes 
bypass the pixel pipeline. 


8.24.1 Linear Frame Buffer Writes 


Linear frame buffer writes -- format 0: 

When writing to the linear frame buffer with 16-bit format 0 (RGB 5-6-5), the RGB channel format 
specifies the RGB ordering within a 16-bit word. If the Napalm pixel pipeline is enabled for LFB accesses 
(IfbMode bit(8)=1), then alpha and depth information for LFB format 0 is taken from the zaColor register. 
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The following table shows the color channels for 16-bit linear frame buffer access format 0: 


Format Value buffer access bits 
pO) SO Reed (15:11), Green (10:5), Blue(4:0) 


Linear frame buffer writes -- format 1: 

When writing to the linear frame buffer with 16-bit format 1 (RGB 5-5-5), the RGB channel format 
specifies the RGB ordering within a 16-bit word. If the Napalm pixel pipeline is enabled for LFB accesses 
IfbMode bit(8)=1), then alpha and depth information for LFB format | is taken from the zaColor register. 


The following table shows the color channels for 16-bit linear frame buffer access format 1: 


Format Value buffer access bits 
fO.-——~iSCSD | Tgnored(15), Red (14:10), Green(9:5), Blue(4:0) 


Linear frame buffer writes -- format 2: 

When writing to the linear frame buffer with 16-bit format 2 (ARGB 1-5-5-5), the RGB channel format 
specifies the RGB ordering within a 16-bit word. If the Napalm pixel pipeline is enabled for LFB accesses 
(Ifb Mode bit(8)=1), then depth information for LFB format 2 is taken from the zaColor register. Note that 
the 1-bit alpha value passed when using LFB format 2 is bit-replicated to yield the 8-bit alpha used in the 
pixel pipeline. The following table shows the color channels for 16-bit linear frame buffer access format 2: 


RGB Channel 16-bit Linear frame RGB Channel 
— Value buffer access bits 


Alpha(15), Red (14:10), Green(9:5), Blue(4:0) 


es re Alpha(15), Blue (14:10), Green(9:5), Red(4:0) 
Red (15:11), Green(10:6), Blue(5:1), Alpha(0) 
Blue (15:11), Green(10:6), Red(5:1), Alpha(0 


Linear frame buffer writes -- format 3: 


Linear frame buffer format 3 is an unsupported format. 


Linear frame buffer writes -- format 4: 

When writing to the linear frame buffer with 24-bit format 4 (RGB x-8-8-8), the RGB channel format 
specifies the RGB ordering within a 24-bit word. Note that the alpha/A channel is ignored for 24-bit 
access format 4. Also note that while only 24-bits of data is transfered for format 4, all data access must be 
32-bit aligned -- packed 24-bit writes are not supported by Napalm. If the Napalm pixel pipeline is 
enabled for LFB accesses (IfbMode bit(8)=1), then alpha and depth information for LFB format 4 is taken 
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from the zaColor register. 


[31]. The following table shows the color channels for 24-bit linear frame buffer 


access format 4: 


RGB Channel 24-bit Linear frame RGB Channel 
Format Value buffer access bits 
Ce to 32-bits) 


—————E Ignored(3: :24), Red (23:16), Green(15:8), Blue(7:0) 


$6 Ge 24). Green 23:16 Blue(15: 8), i enol: 0) 
3 SS~*O————_~ Blue(31:24), Green(23:16), Red(15:8), Ignored(7:0) 


Linear frame buffer writes -- format 5: 

When writing to the linear frame buffer with 32-bit format 5 (ARGB 8-8-8-8), the RGB channel format 
specifies the ARGB ordering within a 32-bit word. If the Napalm pixel pipeline is enabled for LFB 
accesses (IfbMode bit(8)=1), then depth information for LFB format 5 is taken from the zaColor register. 
The following table shows the color channels for 32-bit linear frame buffer access format 5. 


RGB Channel 24-bit Linear frame RGB Channel 
Format Value buffer access bits 
(aligned to 32-bits) 


fO)—~C HC Apha(3 1:24), Red (23:16), Green(15:8), Blue(7:0) 
SS ae Alpha(31:24), Blue(23:16), Green(15:8), Red(7:0) 
[20 Cd 


Red(31:24), Green(23:16), Blue(15:8), Alpha(7:0 
3 S* RO ——_~ Blue(31:24), Green(23:16), Red(15:8), Alpha(7:0) 


Linear frame buffer writes -- formats 6-7: 
Linear frame buffer formats 6-7 are unsupported formats. 
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Linear frame buffer writes -- format 12: 
When writing to the linear frame buffer with 32-bit format 12 (Depth 16, RGB 5-6-5), the RGB channel 
format specifies the RGB ordering within the 32-bit word. If the Napalm pixel pipeline is enabled for LFB 


accesses ing bit(8)=1), then a information for LFB format 12 is taken from the zaColor ‘me 


Note that the format of the depth value passed when using LFB format 12 must precisely match the format 
of the type of depth buffering being used (either 16-bit integer Z or 16-bit floating point 1/W). The 
following table shows the 16-bit color channels within the 32-bit linear frame buffer access format 12: 


—e Value buffer access bits 
ee a 


Linear frame buffer writes -- format 13: 


When writing to the linear frame buffer with 32-bit format 13 (Depth 16, RGB x-5-5-5), the RGB channel 
format specifies the RGB ordering within the 32-bit word. If the Napalm pixel pipeline is enabled for LFB 
accesses (IfbMode bit(8)=1), then alpha information for LFB format 13 is taken from the zaColor register. 


Note that the format of the depth value passed when using LFB format 13 must precisely match the format 
of the type of depth buffering being used (either 16-bit integer Z or 16-bit floating point 1/W). The 
following table shows the 16-bit color channels within the 32-bit linear frame buffer access format 13: 


—e Value buffer access bits 
Ignored(15), Red (14: 10), Green(9: 5), Blue(4: 0) 
a Ce eG ee O Seca 


Red (15:11), Green(10:6), Blue(5:1), Ignored(0) 
Blue (15:11), Green(10:6), Red(5:1), Ignored(0) 


Linear frame buffer writes -- format 14: 

When writing to the linear frame buffer with 32-bit format 14 (Depth 16, ARGB 1-5-5-5), the RGB 
channel format specifies the RGB ordering within the 32-bit word. Note that the format of the depth value 
passed when using LFB format 14 must precisely match the format of the type of depth buffering being 
used (either 16-bit integer Z or 16-bit floating point 1/W). Also note that the 1-bit alpha value passed 
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when using LFB format 14 is bit-replicated to yield the 8-bit alpha used in the pixel pipeline. The 
following table shows the 16-bit color channels within the 32-bit linear frame buffer access format 14: 


RGB Channel 16-bit Linear frame RGB Channel 
Format Value buffer access bits 


poO.—“C*~s~*C'SS Alpha 15), Red (14:10), Green(9:5), Blue(4:0) 


Blue (15:11), Green(10:6), Red(5:1), Alpha(0 
When running in 32BPP rendering mode, the incoming 16-bit depth values are converted to the 24-bit 
depth required by the method described in the description of linear frame buffer write format 12. 


Linear frame buffer writes -- format 15: 

When writing to the linear frame buffer with 32-bit format 15 (Depth 16, Depth 16), the format of the 
depth values passed must precisely match the format of the type of depth buffering being used (either 16- 
bit integer Z or 16-bit floating point 1/W). If the Napalm pixel pipeline is enabled for LFB accesses 

(Ifb Mode bit(8)=1), then RGB color information is taken from the color1 register, and alpha information 
for LFB format 15 is taken from the zaColor register. When running in 32BPP rendering mode, the 
incoming 16-bit depth values are converted to the 24-bit depth required by the method described in the 
description of linear frame buffer write format 12. 


Queued VMI host port writes -- format 9: 


When writing to the linear frame buffer with format 9, no data is written to the frame buffer or passed 
down the 3D pipeline. Instead, writes are performed to the VMI host port. When Ifb write format 9 is 
specified, all other bits in IfbMode are ignored. For Ifb writes of lfb format 9, each 32-bit word that is 
transferred holds both the 8-bit data that is to be written to the VMI host port, and also some control and 
address information. The 32-bit word written for lfb write format 9 is as follows: 


When a write to the 3D linear frame buffer space is received and IfbMode bits(3:0)=0x9, then the address 
of the 3D Ifb is ignored and the 32-bit data which is written is used to control the queued VMI host port 
write. The address and data information for the VMI host port write are stored in the data word bits(11:0). 
Bit(12) of the data word is used to control the vmi_rw signal - this value will typically be 0 for active low 
write enables (the standard configuration for VMI). Similarly, bit(13) of the data word is used to control 
the vmi_cs_n signal - again, this value is typically 0 for standard VMI implementations. 


Bit(14) of the data word is used to specify which access mode, either Mode A or Mode B (see the VMI 
specification for a description of these two host port access modes) the queued VMI host port write uses. 
Whether to use Mode A or Mode B is implementation specific. Bit(15) of the data word is used to control 
the access speed of the VMI host port write state machine. Setting bit(15)=1 will slow down all timing 
used by the state machine by 4 normal speed. Bit(15) should not have to be used for normal operation, but 
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8.25 fbzMode Register 


The fbzMode register controls frame buffer and depth buffer rendering functions of the Napalm processor. 
Bits in fbzMode control clipping, chroma-keying, depth-buffering, dithering, and masking. 


10. «| _ Enable clipping rectangle (1=enable 


3 W-Buffer Select (0=Use Z-value for depth buffering, 1=Use W-value for depth 
buffering) 


[8 | Enable dithering (I=enabley 
[9 | RGB buffer write mask (0=disable writes to RGB buffer) 
Rendering commands Y origin (0=top of screen is origin, 1=bottom of screen is origin) 

21 


Depth float select (O=iterated W is used for floating point depth buffering, 1=iterated Z is 
used for floating point depth buffering) 


Bit(0) of fbzMode is used to enable the clipping register. When set, clipping to the rectangle defined by 
the clipLeftRight and clipBottomTop registers inclusive is enabled. When clipping is enabled, the 
bounding clipping rectangle must always be less than or equal to the screen resolution in order to clip to 
screen coordinates. Also note that if clipping is not enabled, rendering may not occur outside of the screen 
resolution. Bit(1) of fbzMode is used to enable the color compare check (chroma-keying). When enabled, 
any source pixel matching the color specified in the chromaKey register is not written to the RGB buffer. 
The chroma-key color compare is performed immediately after texture mapping lookup, but before the 
color combine unit and fog in the pixel datapath. 
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Bit(2) of fbzMode is used to enable stipple register masking. When enabled, bit(12) of fbzMode is used to 


determine the stipple mode -- bit(12)=0 specifies stipple rotate mode, while bit(12)=1 specifies stipple 
pattern mode. 


When stipple register masking is enabled and stipple rotate mode is selected, bit(31) of the stipple register 
is used to mask pixels in the pixel pipeline. For all triangle commands and linear frame buffer writes 
through the pixel pipeline, pixels are invalidated in the pixel pipeline if stipple bit(31)=0 and stipple 
register masking is enabled in stipple rotate mode. After an individual pixel is processed in the pixel 
pipeline, the stipple register is rotated from right-to-left, with the value of bit(0) filled with the value of 
bit(31). Note that the stipple register is rotated regardless of whether stipple masking is enabled (bit(2) in 
fbzMode) when in stipple rotate mode. 


When stipple register masking is enabled and stipple pattern mode is selected, the spatial <x,y> coordinates 
of a pixel processed in the pixel pipeline are used to lookup a 4x8 monochrone pattern stored in the stipple 
register -- the resultant lookup value is used to mask pixels in the pixel pipeline. For all triangle commands 
and linear frame buffer writes through the pixel pipeline, a stipple bit is selected from the stipple register 
as follows: 
switch(pixel_Y[1:0]) { 

case 0: stipple_Y_sel[7:0] = stipple[7:0]; 

case |: stipple_Y_sel[7:0] = stipple[15:8]; 

case 2: stipple_Y_sel[7:0] = stipple[23:16]; 

case 3: stipple_Y_sel[7:0] = stipple[3 1:24]; 


} 

switch(pixel_X[2:0] { 
case 0: stipple_mask_bit = stipple_Y_sel[7]; 
case |: stipple_mask_bit = stipple_Y_sel[6]; 
case 2: stipple_mask_ bit = stipple_Y_sel[5]; 
case 3: stipple_mask_ bit = stipple_Y_sel[4]; 
case 4: stipple_mask_ bit = stipple_Y_sel[3]; 
case 5: stipple_mask_bit = stipple_Y_sel[2]; 
case 6: stipple_mask_bit = stipple_Y_sel[1]; 
case 7: stipple_mask_ bit = stipple_Y_sel[0]; 


If the stipple_mask_bit=0, the pixel is invalidated in the pixel pipeline when stipple register masking is 
enabled and stipple pattern mode is selected. Note that when stipple pattern mode is selected the stipple 
register is never rotated. 


Bits(4:3) specify the depth-buffering function during rendering operations. The depth buffering pipeline is 
shown below: 
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iterated Z[31:0], unclamped iterated W[47:0], unclamped 


treat as 4.28 value, line up 


: R decimal points with 16.32 w-term iterated Z[27: 1 2], clamped 
iterated W[47:0], rien ded bi 
unclamped and zero extended to 48 bits \1 o£ depthfloat_select 
48 48 floatSel 
if(|w-iter[47:32]) { if(|floatSel[47:32]) { 
mant = 0, exp = Oxf, underflow = 1 mant = 0, exp = Oxf, underflow = | 
} else if(!| w-iter[31:16]) { } else if(!| floatSel[31:16]) { 
mant = 1, exp = Oxf, underflow = 0 mant = 1, exp = Oxf, underflow = 0 
} else { } else { 
exp = find_first_one(w=iter[31:16]) exp = find_first_one(floatSel[31:16]) 16 (integer only) 
mant = (w-iter[30:16] << exp), underflow = 0 mant = (floatSel[30:16] << exp), underflow = 0 
1 
5 } 


exponent 4 12 mantissa underflow exponent 4 12 mantissa 


y y 
To adder logic 
16 


wfloat format: 
™ 1.<mant> * 2*exp \1 0 wfloat_select 


oa 16 zaColor[15:0] 


lags C | zbias_enable 
\ ein =] \ ~ Zein wfloat_select 


1. Sign extend 16-bit zaColor to 18 bits 
2. Convert 16-bit depth to 18-bit 
{underflow,underflow,depth} 
3. Add 18-bit values 


4. Clamp to 0-FFFF 
To Fog Unit 


old Depth 
(from Depth Buffer) 


1 
zfunc_It zfunc_eq 


Depth Buffer Pe 
enable 


Depth test pass 


Bit(4) of fbzMode is used to enable depth-buffering. When depth buffering is enabled, a depth 
comparison is performed for each source pixel as defined in bits(7:5)._ When bit(3)=0, the z iterator is used 
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for the depth buffer comparison. When bit(3)=1, the w iterator is used for the depth buffer comparison. 
When bit(3)=1 enabling w-buffering, the inverse of the normalized w iterator is used for the depth-buffer 
comparison. This in effect implements a floating-point w-buffering scheme utilizing a 4-bit exponent and 
a 12-bit mantissa. The inverted w iterator is used so that the same depth buffer comparisons can be used as 
with a typical z-buffer. Section 5.19.1 below further describes the depth-buffering algorithm. 


Bit(8) of fbzMode enables 16-bit color dithering. When enabled, native 24-bit source pixels are dithered 
into 16-bit RGB color values with no performance penalty. When dithering is disabled, native 24-bit 
source pixels are converted into 16-bit RGB color values by bit truncation. When dithering is enabled, 
bit(11) of fbzMode defines the dithering algorithm -- when bit(11)=0 a 4x4 ordered dither algorithm is 
used, and when bit(11)=1 a 2x2 ordered dither algorithm is used to convert 24-bit RGB pixels into 16-bit 
frame buffer colors. 


Bit(9) of fbzMode enables writes to the RGB buffers. Clearing bit(9) invalidates all writes to the RGB 
buffers, and thus the RGB buffers remain unmodified for all rendering operations. Bit(9) must be set for 
normal drawing into the RGB buffers. 


Bit(10) enables writes to the depth-buffer/alpha buffer. When cleared, writes to the depth-buffer are 


invalidated, and the depth-buffer state is unmodified for all rendering operations. Bit(10) must be set for 
normal depth-buffered operation 


Bit(13) of fbzMode enables the alpha-channel mask. When enabled, bit(0) of the incoming alpha value is 
used to mask writes to the color and depth buffers. If alpha channel masking is enabled and bit(0) of the 
incoming alpha value is 0, then the pixel is invalidated in the pixel pipeline, the fbiAfuncFail register is 
incremented, and no drawing occurs to the color or depth buffers. If alpha channel masking is enabled and 
bit(0) of the incoming alpha value is 1, then the pixel is drawn normally subject to depth function, alpha 
blending function, alpha test, and color/depth masking. 


Bit(16) of fozMode is used to enable the Depth Buffer bias. When bit(16)=1, the calculated depth value 
(irrespective of Z or 1/W type of depth buffering selected) is added eran cnr Tanre Depth 
buffer biasing is used to elimate aliasing artifacts when rendering co-planar polygons. 


Bit(17) of fbzMode is used to define the origin of the Y coordinate for rendering operations (FASTFILL 
and TRIANGLE commands) and linear frame buffer writes when the pixel pipeline is bypassed for linear 
frame buffer writes (IfbMode bit(8)=0). Note that bit(17) of fbzMode does not affect linear frame buffer 
writes when the pixel pipeline is bypassed for linear frame buffer writes (IfbMode bit(8)=0), as in this 
situation bit(13) of IfbMode specifies the Y origin for linear frame buffer writes. When cleared, the Y 
origin (Y=0) for all rendering operations and linear frame buffer writes when the pixel pipeline is enabled 
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is defined to be at the top of the screen. When bit(17) is set, the Y origin is defined to be at the bottom of 
the screen. 


Bit(18) of fbzMode is used to enable the destination alpha planes. When set, the auxiliary buffer will be 
used as destination alpha planes. Note that if bit(18) of fbzMode is set that depth buffering cannot be used, 
and thus bit(4) of fbzMode (enable depth buffering) must be set to 0x0 


Bit(19) of fbzMode is used to enable dither subtraction on the destination color during alpha blending. 
When dither subtraction is enabled (fbzMode bit(19)=1), the dither matrix used to convert 24-bit color to 
16-bit color is subtracted from the destination color before applying the alpha-blending algorithm. 


Enabling dither subtraction is used to enhance image quality when performing alpha-blending. 


Bit(20) of fbzMode is used to select the source depth value used for depth buffering. When fbzMode 
bit(20)=0, the source depth value used for the depth buffer comparison is either iterated Z or iterated W (as 
selected by fbzMode bit(3)) and may be biased (as controlled by fbzMode bit(16)). When fbzMode 
bit(20)=1, the constant depth value defined by zaColor is used as the source depth value for the depth 
buffer comparison — 


Regardless of the state of fobzMode bit(20), the 
biased iterated Z/W is written into the depth buffer if the depth buffer function passes. 


Bit(21) of fobzMode is used to select either the w iterator or the z iterator to be used for floating point 
depth buffering. Floating point depth buffering is enabled when fbzMode bit(4)=1. When fbzMode 
bit(21)=0, then the unclamped w iterator is converted to a 4.12 floating point representation and used for 
depth buffering. When fbzMode bit(21)=1, then the unclamped z iterator is converted into a 4.12 floating 
point format and used for depth buffering. 


8.25.1 Depth-buffering function 


When the depth-buffering is enabled (fbzMode bit(4)=1), the following depth comparison is performed: 
DEPTHsrc DepthOP DEPTHast 

where DEPTHsrc and DEPTHdst represent the depth source and destination values respectively. A source 

pixel is written into an RGB buffer if the depth comparison is true and writing into the RGB buffer is 

enabled (fbzMode bit(9)=1). The source depth value is written into the depth buffer if the depth 

comparison is true and writing into the depth buffer is enabled (fbzMode bit(10)=1). The supported depth 

comparison functions (DepthOPs) are shown below: 


pO ever 


less than or equal 
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8.26 renderMode Register 


The renderMode register controls 3D rendering functions of the Napalm processor. Bits in renderMode 
control color pixel depth, stenciling, and Y-Origin swapping. The default value of renderMode[31:0] is 
0x0. 


Bit Description 

3D Rendering mode (0=16BPP, 1=15BPP (1555), 2=32BPP, 3=reserved) 

Y-Origin subtraction value select (0=use miscInit0[29:18], 1=use renderMode[14:3]) 

Y-Origin subtraction value. Used when renderMode[2] = 1. 

1-bit alpha rendering mode (0=force to 0, 1=force to 1, 2= use alpha channel MSB, 
3=reserved). Only used when 15 BPP rendering is enabled (renderMode[1:0]=0x1 

Red buffer write mask (0=disable writes to the red buffer). Only used when 32BPP 
rendering is enabled (renderMode[ 1:0]=0x2) 

Green buffer write mask (0=disable writes to the green buffer). Only used when 32BPP 
rendering is enabled (renderMode[1:0]=0x2) 

Blue buffer write mask (0=disable writes to the blue buffer). Only used when 32BPP 
rendering is enabled (renderMode[1:0]=0x2) 

Alpha buffer write mask (0=disable writes to the alpha buffer). Only used when 32BPP 
rendering is enabled (renderMode[1:0]=0x2) 


Enable triangle guardband clipping (1=enable) 


per-clock rendering band selection (log ight 


Enable dither rotation (1=enable) 


Bits(1:0) of renderMode are used to select the color mode of the 3D rendering surface. 
renderMode(1:0)=0 selects 16BPP (565 RGB) 3D rendering mode, renderMode(1:0)=1 selects 1S5BPP 
(1555 ARGB) 3D rendering mode, and renderMode(1:0)=2 selects 32BPP (8888 ARGB) 3D rendering 
mode. 


Bit(2) of renderMode is used to control which Y-Origin subtraction value to use. When bit(2)=0, the 12- 
bit value in miscInit0[29:18] is used as the Y-Origin subtraction value. This mode is included for legacy 
compatibility. When renderMode[2]=1, the 12-bit value stored in renderMode[14:3] is used as the Y- 
Origin subtraction value. As renderMode is a queued register, software is able to dynamically change the 
Y-Origin subtraction value without idling the Napalm 3D engine. 


Bits(16:15) of renderMode are used to control the 1-bit alpha channel when 1555 rendering is selected 
(renderMode[1:0]=0x1). When renderMode[16:15]=0, the 1-bit alpha value is always forced to 0, 
regardless of the source alpha channel value. Similarly, when renderMode[16:15]=1, the 1-bit alpha value 
is always forced to 1. When renderMode[16:15]=2, the most significant bit (MSB, bit 7) of the alpha 
channel is stored as the 1-bit alpha (note this is the MSB of the alpha channel after alpha-channel alpha 
blending). Note that bits(16:15) are not used when 1555 rendering is not enabled. See the fastfillCMD 
register description for how renderMode bits(16:15) work with the FASTFILL command. 


Bits(19:17) of renderMode enables writes to the green, blue, and blue color planes respectively. Clearing 
one of these bits invalidates all writes to the respective color plane, and thus the particular color plane 
remains unmodified for all rendering operations. Bits(19:17) must be set for normal drawing into the color 
buffers. It is important to note that fbzMode bit(9) must be set to enable writes to any of the individual 
color planes, independent of the status of renderMode bits(19:17). Note that renderMode bit(19:17) are 
only used when 32 BPP rendering is enabled. 
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Bit(20) of renderMode enables writes to the alpha buffer. Clearing bit(20) invalidates all writes to the 
alpha buffer, and thus the alpha buffer remain unmodified for all rendering operations. Bit(20) must be set 
for normal drawing into the alpha buffer. It is important to note that fbzMode bit(9) must be set to enable 
writes to the alpha buffer when running in 32 BPP rendering mode, independent of the status of 
renderMode bit(20). Note that renderMode bit(20) is only used when 32 BPP rendering is enabled. 
When either 15 BPP and 16 BPP rendering modes are enabled, fbzMode bit(10) is used to enable writes to 
the alpha buffer. When 32 BPP rendering is enabled, fbzMode bit(10) is used to enable writes to the depth 
buffer. 


Bit(21) of renderMode enables the guardband clipping functionality of the triangle iterators. When 
renderMode bit(21) is set, pixels which fall outside of the clipping rectangle defined by clipLeftRight1 
and clipTopBottom1 will not be rendered. The triangle iterators are optimized to quickly disgard pixels 
outside of the guardband clipping region to improve rasterization performance. To define a guardband 
clipping region, the value of clipTop1 must be less than clipBottom1 and the value of clipLeft1 must be 
less than clipRight1. The guardband clipping region is a rectangular region including the edges defined by 
clipLeft1 and clipTop1, but excluding the edges defined by clipRight1 and clipBottom1. Note that when 
guardband clipping is enabled, the left and right edges of the clipping rectangle defined in clipLeftRight1 
must be aligned on even pixel boundries. 


Bits(24:22) of renderMode control the scanline band selection when 2 pixel-per-clock operation is 
enabled. See the combineMode register description for more information on 2 pixel-per-clock rendering. 


Bit(25) of renderMode is used to enable rotation of the matrices used for dithering. When renderMode 
bit(25) is set, then the dither matrices are rotated as specified by fogMode bits(19:12). fogMode 
bits(13:12) control the dither matrix applied to colors after the alpha blending unit before they are stored 
into the frame buffer in either 555 or 565 format for triangle rendering. fogMode bits(15:14) control the 
“undither matrix” applied to convert the 555/565 destination colors into an 888 value before being used in 
the alpha blending unit as the Destination Color for triangle rendering. When anti-aliasing is enabled 
(aaCtrl[28]=1), then fogMode bits(17:16) are used to control the dither matrix applied to colors after the 
alpha blending unit before they are stored into the frame buffer in either 555 or 565 format for the second 
triangles drawn during AA rendering (i.e. the repeat triangles). Similarly, when anti-aliasing is enabled, 
fogMode[19:18] control the “undither matrix” applied to convert the 555/565 destination colors into an 
888 value before being used in the alpha blending unit as the Destination Color for the repeated triangles. 
Obviously, fogMode bits(19:13) are ignored when dithering is disabled (fbzMode[8]=0), and fogMode 
bits(15:14) and bits(19:18) are ignored when dither subtraction is disabled (fbzMode[19]=0). 


The purpose of rotating the dither matrix is to improve the quality of anti-aliased rendering for 15 and 16 
BPP rendering. For highest quality anti-aliased rendering, each sub-sample rendered should have a 
different dither matrix. Each 2-bit field in fogMode selects from one of 4 different dither matrices as 
follows: 


Matrix #0 (default) 
0 8 2 10 
12 4 14 6 
S ii A g 
Hora) Aes 
Matrix #1 

PAE) 14 2 

4 8 6 10 
15 3 alts ae 


a qi 7S 9 
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Stencil Function 


Never 

Less than 

Equal 

Less than or equal 
Greater than 

Not equal 

Greater than or equal 
Always 


When the stencil test fails, neither the color buffer or the depth buffer is updated, and the stencil buffer may 
be modified as controlled by the stencilOp register. When the stencil test passes, the color buffer and/or 
the depth buffer may be updated depending on the depth test, and the stencil buffer may be modified as 
controlled by the stencilOp register. When the stencil buffer is updated, the bits written into the stencil 
buffer are only those bits whose corresponding bit position in stencilMode bits(23:16) is set. See the 
description of the stencilOp register for more description of the stenciling operation supported by Napalm. 


Note that the value in stencilMode[7:0] is used as the constant stencil value stored into the stencil buffer 
by the FASTFILL command. 


8.28 stencilOp Register 


The stencilOp register specifies what happens to the stored stencil value while stenciling is enabled. The 
stencilOp register specifies what operation to perform depending upon whether the pixel fails the stencil 
test, fails the depth test, or passes the depth test Note that the stenciling functions are only capable of being 
used when 32 BPP rendering is enabled. 


=i Fail operation (see table below) 


[7:4 «YL sSteneil Z Fail operation (see table below) 


Stencil Z Pass operation (see table below) 


The stencilOp register specifies what happens to the stencil buffer while stenciling is enabled 
(stencilMode bit(24)=1). Ifthe stencil test fails, no change is made to the pixels color or depth buffers, 
and stencilOp bits(3:0) specify what happens to the stencil buffer contents. If the stencil test passes and 
the depth test fails, then stencilOp bits(7:4) specify what happens to the stencil buffer contents. 
Similarly, if the stencil test passes and the depth test passes, then stencilOp bits(11:8) specify what 
happens to the stencil buffer contents. Note that if the stencil test passes that the color and/or depth 
buffers are updated based on the result of the depth test. Also note that if the depth buffer is disabled that 
the depth check is assumed to have passed and only the Stencil Fail and Stencil Z Pass operations will be 
used. 


Depending on whether the pixel fails the stencil test, fails the depth test, or passes the depth test, the 
stencil buffer value is updated as specified in the rules described above according to the operations 
specified in the table below: 


Stencil Operation 
POO Rep 
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| | eefault is 0). 


aaCtrl bits(13:0) specify the XY offset to the primary render buffers. Regardless of the setting of aaCtrl 
bit(28), all triangle XY coordinates for all vertices are added to the signed values stored in aaCtrl 
bits(13:0) to effectively “shift” all rendered triangles. When anti-aliased rendering is enabled (aaCtrl 
bit(28)=1), all rendering commands (triangle commands, Fastfill commands, and linear frame buffer 
writes) are all repeated a second time. The first time a rendering command is seen, it is processed 
unmodified (with the exception of the XY offsets specified in aaCtrl bits(13:0) being added to the triangle 
XY vertices, which is always enabled), using the target buffer offsets specified by the primary 
colBufferAddr and auxBufferAddr registers. However, when aaCtrl bit(28)=1, the rendered command 
will also be executed a second time, this time using the secondary colBufferAddr and auxBufferAddr 
registers to specify the target buffer offsets (see the colBufferAddr and auxBufferAddr registers 
descriptions for how the secondary buffer offsets are specified). When aaCtrl bit(28)=1 and a triangle 
command is received, the second triangle which is rendered into the secondary buffers will be offset using 
the XY offset values stored in aaCtrl bits(27:14). 


When anti-aliased rendering is enabled (aaCtrl bit(28)=1), triangles will only be rendered to the secondary 
buffer offsets when aaCtrl bit(30) is set. 


8.31 chipMask Register 


The chipMask register controls masking writes to an entire chip 


[OS Enable writes to chip #0 (I=enable). Defaultist 
fi | Enable writes to chip #1 (I=enable). Defaultist, 
[2 | Enable writes to chip #2(I=enable). Defaultist. 
[3 Enable writes to chip #3 (I=enable). Defaultist. 
|4 | Enable writes to chip #4 (I=enable). DefaultisI. 
[5 | Enable writes to chip #5 (I=enable). Defaultist 
Gi Enable writes to chip #6 (l=enable). Default is 1. 

[8 | Enable writes to chip #8 (I=enable). DefaultisI. 
}9 | Enable writes to chip #9 (I=enable). Defaultist 
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8.32 stipple Register 


The stipple register specifies a mask which is used to enable individual pixel writes to the RGB and depth 
buffers. See the stipple functionality description in the fbzMode register description for more information. 


Bit 
stipple value 


8.33 color0 Register 

The color0 register specifies constant color values which are used for certain rendering functions. In 
particular, bits(23:0) of color0 are optionally used as the c_local input in the color combine unit. In 
addition, bits(31:24) of color0 are optionally used as the c_local input in the alpha combine unit. See the 
fbzColorPath register description for more information. 


Constant Color Blue 


Constant Color Green 
23:16 Constant Color Red 
31:24 Constant Color Alpha 


8.34 color! Register 


The color1 register specifies constant color values which are used for certain rendering functions. In 
particular, bits(23:0) of color1 are optionally used as the c_other input in the color combine unit selected 
by bits(1:0) of fbzColorPath. The alpha component of color1(bits(31:24)) are optionally used as the 
a_other input in the alpha combine unit selected by bits(3:2) of fobzColorPath. The color1 register 

are also used by the FASTFILL command as the constant color for screen clears. Also, for linear 
frame buffer write format 15(16-bit depth, 16-bit depth), the color for the pixel pipeline is taken from 
color! if the pixel pipeline is enabled for linear frame buffer writes (IfbMode bit(8)=1). 


Constant Color Blue 


Constant Color Green 
23:16 Constant Color Red 
31:24 Constant Color Alpha 
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8.35 fogColor Register 


The fogColor register is used to specify the fog color for fogging operations. Fog is enabled by setting 
bit(0) in fogMode. See the fogMode and fogTable register descriptions for more information fog. 


Description 
Fog Color Blue 


Fog Color Green 
Fog Color Red 
Reserved 


8.36 zaColor Register 


The zaColor register is used to specify constant alpha and depth values for linear frame buffer writes, 
FASTFILL commands, and co-planar polygon rendering support. For certain linear frame buffer access 
formats, the alpha and depth values associated with a pixel written are the values specified in zaColor. See 
the IfbMode register description for more information. When executing the FASTFILL command, the 
constant depth value written into the depth buffer is taken from zaColor. 


[23:0 | Constant Depth 
31:24 Constant Alpha 


8.37 chromaKey Register 


The chromaKey register specifies a color which is compared with all pixels to be written into the RGB 
buffer. If a color match is detected between an outgoing pixel and the chromaKey register, and chroma- 
keying is enabled (bit(1)=1 in the fbzMode register), then the pixel is not written into the frame buffer. An 
outgoing pixel will still be written into the RGB buffer if chroma-keying is disabled or the chromaKey 
color does not equal the outgoing pixel color. Note that the alpha color component of an outgoing pixel is 
ignored in the chroma-key color match circuitry. The chroma-key comparison is performed immediately 
after texture lookup, but before lighting, fog, or alpha blending. See the description of the fbzColorPath 
register for further information on the location of the chroma-key comparison circuitry. 


Chroma-key Blue 


Chroma-key Green 
23:16 Chroma-key Red 


8.38 chromaRange Register 


The chromaRange register specifies a 24-bit RGB color value which is comared to all pixels to be written 
to the color buffer. If chroma-keying is enabled (fbzMode[1]) and chroma-ranging is enabled 
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(chromaRange[28]), the outgoing pixel color is compared to a color range formed by the colors of the 
chromaKey and chromaRange registers. 


Each RGB color component of the chromaKey and chromaRange registers defines a chroma range for the 
color component The color component range includes the lower limit color from the chromaKey register 
and the upper limit color from the chromaRange register. Software must program the lower limits less- 
than or equal to the upper limits. 


Each RGB color component chromaRange mode defines the color component range as inclusive or 
exclusive. Inclusive ranges prohibit colors within the range and exclusive ranges prohibit colors outside of 
the range. 


Prohibited colors are blocked from the frame buffer based on the chromaRange mode. This mode may be 
set to “intersection” or “union”. The intersection mode blocks pixels prohibited by all of the color 
components and the union mode blocks pixels prohibited by any of the color components 


a  _———r_—=“*éwCtC*C*CtéCéCé#Cé#é# 


| ee 
|30:29 | Enable texture chroma substitution (Qx3=enable) 
el 

[cS Sear 


8.39 userIntrCMD Register 
Writing to the userIntrCMD register executes the USERINTERRUPT command: 


Description 


Wait for USERINTERRUPT to be cleared before continuing (1=stall graphics engine 
until interrupt is cleared) 


Wait for interrupt generated by USERINTERRUPT (visible i in intrCtrl bit(11)) to be 
cleared before continuing i 


User interrupt Tag 


If the data written to userIntrCMD bit(0)=0, then a user interrupt is generated (intrCtrl bit(11) is set to 1). 
If the data written to userIntrCMD bit(1)=1, then the graphics engine stalls and waits for the 
USERINTERRUPT interrupt to be cleared before continuing processing additional commands. If no 
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USERINTERRUPT interrupt is set and the data written to userIntrCMD bit(1)=1, then the graphics 
engine will not stall and will continue to process additional commands. Software may also use 
combinations of intrCtrl bits(1:0) to generate different functionality. 


The tag associated with a user interrupt is written to userIntrCMD bits 9:2. When a user interrupt is 
generated, the respective tag associated with the user interrupt is read from IntrCtrl bits 19:12. 


If the USERINTERRUPT command does not stall the graphics engine (userIntrCMD(0)=1), then a 
potential race condition occurs between multiple USERINTERRUPT commands and software user 
interrupt processing. In particular, multiple USERINTERRUPT commands may be generated before 
software is able to process the first interrupt. Irrespective of how many user interrupts have been 
generated, the user interrupt tag field in intrCtrl (bits 19:12) always reflects the tag of last 
USERINTERRUPT command processed. As a result of this behavior, early tags from multple 
USERINTERRUPT commands may be lost. To avoid this behavior, software may force a single 
USERINTERRUPT command to be executed at a time by writing userIntrCMD(1:0)=0x3 and cause the 
graphics engine to stall until the USERINTERRUPT interrupt is cleared. 


Note that bit 5 of intrCtrl must be set to | for user interrupts to be generated — writes to userIntrCMD 
when intrCtrl(5)=0 do not generate interrupts or cause the processing of commands to wait on clearing of 
the USERINTERRUPT command (regardless of the data written to userIntrCMD), and are thus in effect 
“dropped.” 


8.40 colBufferAddr 


The colBufferAddr register defines the base address of the color buffer. The address must be 16-byte 
aligned, so colBufferAddr[3:0] are unused. 


Color Buffer Base Address. Must be 16-byte aligned 


8.41 colBufferStride 


If the color buffer is linear (colBufferStride[15]=0) then colBufferStride[13:0] defines the linear stride of 
the color buffer in bytes. Linear stride must be 16-byte aligned. If the color buffer is tiled 
(colBufferStride[15]=1) then colBufferStride[6:0] defines the tile stride for the color buffer in tiles. 


if [15] =0 then 
linear: [13:0] = linear stride in bytes 


else 


tiled: [6:0] = tile stride in tiles; [13:7] are reserved. 


Memory type (0=linear; 1=tiled) 
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8.42 auxBufferAddr 


The auxBufferAddr register defines the base address of the auxiliary buffer. The existence and enabling of 
the depth or the alpha auxiliary buffers is established within the fbzMode register. AuxBufferAddr must be 
16 byte aligned, so auxBufferAddr[3:0] are unused. 


Description 
reserved 
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8.43 auxBufferStride 


If the aux buffer is linear (auxBufferStride[15]=0) then auxBufferStride[13:0] defines the linear stride of 
the aux buffer in bytes. Linear stride must be 16-byte aligned. If the aux buffer is tiled 
(auxBufferStride[15]=1) then auxBufferStride[6:0] defines the tile stride for the aux buffer in tiles. 


Description 
if [15] =0 then 
linear: [13:0] = linear stride in bytes 


else 

tiled: [6:0] = tile stride in tiles; [13:7] are reserved. 
reserved 
Memory type (0=linear; 1=tiled) 


8.44  clipLeftRight and clipTopBottom Registers 


The clipLeftRight and clipTopBottom registers specify a rectangle within which all drawing operations 
are confined. Ifa pixel is to be drawn outside the clip rectangle, it will not be written into the RGB or 
depth buffers. Note that the specified clipping rectangle defines a valid drawing area in both the RGB and 
depth/alpha buffers. The values in the clipping registers are given in pixel units, and the valid drawing 
rectangle is inclusive of the clipleft and elipTop register values, but exclusive of the clipRight and 

register values. must be less than elipBottom, and clipLeft must be less than 
clipRight. The registers are be enabled by setting bit(0) in the 
fbzMode register. When clipping is enabled, the bounding clipping rectangle must always be less than or 
equal to the screen resolution in order to clip to screen coordinates. Also note that if clipping is not 
enabled, rendering must not be specified to occur outside of the screen resolution. 


Important Note: The clipTopBottom register is defined such that y=0 always resides at the top of the 
monitor screen. Changing the value of the Y origin bits (fbzMode bit(17) or IfbMode bit(13)) has no 
affect on the clipTopBottom register orientation. As a result, if the Y origin is defined to be at the bottom 
of the screen (by setting one of the Y origin bits), care must be taken in setting the clipTopBottom register 
to ensure proper functionality. In the case where the Y origin is defined to be at the bottom of the screen, 
the value of clipTopBottom is usually set as the number of scan lines in the monitor resolution minus the 
desired Y clipping values. 


The clipLeftRight and clipTopBottom registers are also used to define a rectangular region to be drawn 
during a FASTFILL command. Note that when clipTopBottom is used to specify a rectangular region for 
the FASTFILL command, the orientation of the Y origin (top or bottom of the screen) is defined by the 
status of fbzMode bit(17). See section 7 and the fastfillCMD register description for more information on 
the FASTFILL command. 


clipLeftRight Reg ier 


 ——————————————————————— 
Unsigned integer specifying right cli ) 


15:12 
27:16 Unsigned integer specifying left clipping rectangle edge ( 
31:28 


clipTopBottom Register 
a i 


Unsigned integer specifying clipping rectangle edge ( ) 
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reserved 


Unsigned integer specifying top clipping rectangle edge (clipTop) 


31:28 reserved 


8.45 clipLeftRight1, clipTopBottom1 Registers 


The registers specify two 
rectangular regions which restrict drawing operation. The secondary clip rectangles may be defined as 
inclusive or exclusive through the clipMode field of the clipTopBottom1 register. An inclusive rectangle 
allows drawing within the rectangle and an exclusive rectangle disallows drawing within the rectangle. 
Drawing within an excluded region of either of the clip rectangles circumvents the write of pixels into both 
the color and auxiliary buffers. 


The clip registers define the four corners of a rectangular region in window relative pixel coordinates 
(native x/y rendering coordinates). The value of must be less than and the value of 


must be less than clipRight1. This programming results in a rectangular region including the 
and clipTop1 register values, but excluding the clipRighti and einiaeual register values. 


ClipLeftRight1 Register 


Unsigned integer specifying right clipping rectangle edge ( 
15:12 


Unsigned integer specifying 


Clip Enable (0=disable, 1=enable) 


ClipTopBottom1 Register 


Unsigned integer specifying clipping rectangle edge ( ) 
15:12 


27:16 clipping rectangle edge ( 
30:28 
Clip Mode (0=inclusive, 1=exclusive) 


8.46 fogTable Register 


The fogTable register is used to implement fog functions in Napalm. The fogTable register is a 64-entry 
lookup table consisting of 8-bit fog blending factors and 8-bit Afog blending values. The Afog blending 
values are the difference between successive fog blending factors in fogTable and are used to blend 
between fogTable entries. Note that the Afog blending factors are stored in 6.2 format, while the fog 
blending factors are stored in 8.0 format. For most applications, the 6.2 format Afog blending factors will 
have the two LSBs set to 0x0, with the six MSBs representing the difference between successive fog 
blending factors. Also note that as a result of the 6.2 format for the Afog blending factors, the difference 
between successive fog blending factors cannot exceed 63. When storing the fog blending factors, the sum 
of each fog blending factor and Afog blending factor pair must not exceed 255. When loading fogTable, 
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two fog table entries must be written concurrently in a 32-bit word. A total of 32 32-bit PCI writes are 
required to load the entire fogTable register. 


fogTable[n] (0 <n < 31) 
Description 
FogTable[2n] AFog blending factor 


FogTable[2n] Fog blending factor 
FogTable[2n+1] AFog blending factor 
FogTable[2n+1] Fog blending factor 


8.47  fbiPixelsIn Register 


The fbiPixelsIn register is a 24-bit counter which is incremented for each pixel processed by the Napalm 
triangle walking engine. fbiPixelsIn is incremented irrespective if the triangle pixel is actually drawn or 
not as a result of the depth test, alpha test, etc. fbiPixelsIn is used primarily for statistical information, and 
in essence allows software to count the number of pixels in a screen-space triangle. fbiPixelsIn is reset to 
0x0 on power-up reset, and is reset when a ‘1’ if written to the Isb of nopCMD. 


Bit Description 


23:0 Pixel Counter (number of pixels processed by Napalm triangle engine) 


8.48  fbiChromaFail Register 


The fbiChromaFail register is a 24-bit counter which is incremented each time an incoming source pixel 
(either from the triangle engine or linear frame buffer writes through the pixel pipeline) is invalidated in 
the pixel pipeline because of the chroma-key color match test. If an incoming source pixel color matches 
the chomaKey register, fbiChromaFail is incremented. fbiChromaFail is reset to 0x0 on power-up reset, 
and is reset when a ‘1’ if written to the Isb of nopCMD. 


Bit Description 


23:0 Pixel Counter (number of pixels failed chroma-key test) 


8.49 fbiZfuncFail Register 


The fbiZfuncFail register is a 24-bit counter which is incremented each time an incoming source pixel 
(either from the triangle engine or linear frame buffer writes through the pixel pipeline) is invalidated in 
the pixel pipeline because of a failure in the Z test. The Z test is defined and enabled in the fbzMode 
register. fbiZfuncFail is reset to 0x0 on power-up reset, and is reset when a ‘1’ if written to the Isb of 
nopCMD. 


Bit Description 


23:0 Pixel Counter (number of pixels failed Z test) 
8.50 fbiAfuncFail Register 


The fbiAfuncFail register is a 24-bit counter which is incremented each time an incoming source pixel 
(either from the triangle engine or linear frame buffer writes through the pixel pipeline) is invalidated in 
the pixel pipeline because of a failure in the alpha test. The alpha test is defined and enabled in the 
alphaMode register. The fbiAfuncFail register is also incremented if an incoming source pixel is 
invalidated in the pixel pipeline as a result of the alpha masking test (bit(13) in fbzMode). fbiAfuncFail is 
reset to 0x0 on power-up reset, and is reset when a ‘1’ if written to the Isb of nopCMD. 
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Bit Description 
Pixel Counter (number of pixels failed Alpha test) 


8.52 fbiPixelsOut Register 


The fbiPixelsOut register is a 24-bit counter which is incremented each time a pixel is written into a color 
buffer during rendering operations (rendering operations include triangle commands, linear frame buffer 
writes, and the FASTFILL command). Pixels tracked by fbiPixelsOut are therefore subject to the chroma- 
test, Z test, Alpha test, etc. that are part of the regular Napalm pixel pipeline. fbiPixelsOut is used to count 
the number of pixels actually drawn (as opposed to the number of pixels processed counted by 
fbiPixelsIn). Note that the RGB mask (fbzMode bit(9) is ignored when determining fbiPixelsOut. 
fbiPixelsOut is reset to 0x0 on power-up reset, and is reset when a ‘1’ if written to the Isb of nopCMD. 


Description 
Pixel Counter (number of pixels drawn to color buffer) 


8.53 swapBufferPend Register 


Writes to the swapBufferPend register increments the swap buffer pending count of the Napalm status 
register. Writes take effect immediately and are available only through direct access. 


8.54  leftOverlayBuf Register 


Starting address of left or Monocular buffer address for overlay display. For video overlay, the start 
address needs to be aligned on a 32-bit boundary for YUV 422 pixel format and a 64-bit boundary for 
YUV 411 pixel format. This register is sampled at the end of vertical retrace. 


8.55 RightOverlayBuf Register 


Starting address of right buffer address for overlay display. For video overlay, the start address needs to be 
aligned on a 32-bit boundary for YUV 422 pixel format and a 64-bit boundary for YUV 411 pixel format. 
This register is only used for stereo buffering. This register is sampled at the end of vertical retrace. 


Bit Description 
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Starting address of the overlay surface buffer 0. If overlay surface resides in linear 
space, the address is the physical address. 


Starting address of left or Monocular buffer address for desktop display. This register is sampled at the 
end of vertical retrace. 


8.57  fbiSwapHistory Register 


The fbiSwapHistory register keeps track of the number of vertical syncs which occur between executed 
swap commands. fbiSwapHistory logs this information for the last 8 executed swap commands. Upon 
completion of a swap command, fbiSwapHistory bits (27:0) are shifted left by four bits to form the new 
fbiSwapHistory bits (31:4), which maintains a history of the number of vertical syncs between execution 
of each swap command for the last 7 frames. Then, fbiSwapHistory bits(3:0) are updated with the number 
of vertical syncs which occurred between the last swap command and the just completed swap command or 
the value Oxf, whichever is less. 


3:0 Number of vertical syncs between the second most recently completed swap command 
and the most recently completed swap command, or the value Oxf, whichever is less for 


Frame N. 


Vertical sync swapbuffer history for Frame N-1 
Vertical sync swapbuffer history for Frame N-2 
N- 


: i N 
19:16 Vertical sync swapbuffer history for Frame N-4 
23:20 Vertical sync swapbuffer history for Frame N-5 
27:24 Vertical sync swapbuffer history for Frame N-6 


31:28 Vertical sync swapbuffer history for Frame N-7 


15:12 Vertical sync swapbuffer history for Frame N-3 


8.58  fbiTrianglesOut Register 


The fbiTriangles register is a 24-bit counter which is incremented for each triangle processed by the 
Napalm triangle walking engine. Triangles which are backface culled in the triangle setup unit do not 
increment fbiTrianglesOut. fbiTrianglesOut is reset to 0x0 on power-up reset, and is also reset to 0x0 
when a ‘1’ is written to nopCMD bit(1). 


Rendered triangles (total number of triangles rendered by Napalm triangle rendering 
engine) 


8.59 sSetupMode Register 


The sSetupMode register provides a way for the CPU to only setup required parameters. When a Bit is set, 
that parameter will be calculated in the setup process, otherwise the value is not passed down to the 
triangle, and the previous value will be used. Also the definition of the triangle strip is defined in bits 
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21:16, where bit 16 defines fan. Culling is enabled by seting bit 17 to a value of “1”, whereas bit 18 
defines the culling sign. Bit 19 disables the ping pong sign inversion that happens during triangle strips. 


Description 

Setup Red, Green, and Blue 
Setup Alpha 

Setup Z 

Setup Wb 

Setup WO 

Setup SO and TO 


Setup W1 

Setup S1 and T1 

reserved 

Strip mode (0=strip, 1=fan 

Enable Culling (0=disable, 1=enable) 

Culling Sign (O=positive sign, 1=negative sign) 

Disable ping pong sign correction during triangle strips (0O=normal, 1=disable) 


8.60 Triangle Setup Vertex Registers 


The sVx, sVy registers specify the x and y coordinates of a triangle strip to be rendered. A triangle strip, 
once the initial triangle has been defined, only requires a new X and Y to render consecutive triangles. The 
diagram below illustrates how triangle strips are sent over to Napalm. 


Di D3 D3 
D2 
D5 
dD 4 
1 
D4 spy 
D2 
R R 


Triangle Strip Triangle Fan 


Triangle strips and triangle fans are implemented in Napalm by common vertex information and 2 triangle 
commands. Vertex information is written to Napalm for a current vertex and are followed by a write to 
either the sBeginTriCMD or the sDrawTriCMD . For example, to render the triangle strip in the above 
figure, parameters X, Y, ARGB, WO, S/W, T/W for vertex R would be written followed by a write to 
sBeginTriCMD. Vertex D1’s parameters would next be written followed by a write to the sDrawTriCMD. 
After D2’s data has been sent, and the 2™ write to sDrawTriCMD has been completed Napalm will begin to 
render triangle 1. As triangle | is being rendered, data for vertex D3 will be sent down followed by 
another write to sDrawTriCMD, thus launching another triangle. Triangle fans are very similar to triangle 
strips. Instead of changing all three vertices, only the last 2 get modified. Triangle fans start with a 
sBeginTriCMD just as the triangle strip did, and send down sDrawTriCMD for every new vertex. To 
select triangle fan or triangle strip, you must write bit 0 of the triangle setup mode register. 


SVx Register 


Bit 


Vertex coordinate information (IEEE 32 bit single-precision floating point format) 
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sVy Register 


Description 


Vertex coordinate information (IEEE 32 bit single-precision floating point format) 


8.61 sARGB Register 
The ARGB register specify the color at the current vertex in a packed 32 bit value. 


Bit Description 


Alpha Color 
Red Color 
Green Color 


8.62 sRed Register 


the sRed register is the separated red value for the current vertex. 


Red value at vertex (0.0 - 255.0). (IEEE 32 bit single-precision floating point format) 


8.63 sGreen Register 


The sGreen register is the separated green value for the current vertex. 


Green value at vertex (0.0 - 255.0). (IEEE 32 bit single-precision floating point format) 


8.64 sBlue Register 


The sBlue register is the separated blue value for the current vertex. 


Blue value at vertex (0.0 - 255.0). (IEEE 32 bit single-precision floating point format) 


8.65 sAlpha Register 


the sAlpha register is the separated alpha value for the current vertex. 


Alpha value at vertex (0.0 - 255.0). (IEEE 32 bit single-precision floating point format) 


8.66 sVz Register 


The Vz register is the Z value at the current vertex. 
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Bit Description 
Vertex coordinate information (IEEE 32 bit single-precision floating point format) 


8.67 sWb Register 
The Wb register is a global 1/W that is sent to both the FBI and all TMUs. 


Description 


Global 1/W. (IEEE 32 bit single-precision floating point format). 


8.68 sWtmu0 Register 
The sWtmwu0 register is all the TMUs local 1/W value for the current vertex. 


Texture local 1/W. (IEEE 32 bit single-precision floating point format) 


8.69 sS/WO0 Register 
The S/W0 register is the S coordinate of the current vertex divided by W, for all TMUs. 


Texture S coordinate (IEEE 32 bit single-precision floating point format) 


8.70 sT/WO0O Register 
The T/W register s the T coordinate of the current vertex divided by W, for all TMUs. 


Texture T coordinate (IEEE 32 bit single-precision floating point format) 


8.71 sWtmul Register 
The sWtmul register is TMU1’s local 1/W value for the current vertex. 


Texture local 1/W. (IEEE 32 bit single-precision floating point format) 


8.72 sS/Wtmul Register 
The sS/Wtmul register is TMU1’s local S/W value for the current vertex. 


Texture local 1/W. (IEEE 32 bit single-precision floating point format) 


8.73. sT/Wtmul Register 
The sT/Wtmul register is TMU1’s local T/W value for the current vertex. 


Bit 
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Texture local 1/W. (IEEE 32 bit single-precision floating point format) 


8.74 sDrawTriCMD Register 
The DrawTriCMD registers starts the draw process. 


Bit Description 
Draw triangle 


8.75 sBeginTriCMD Register 
A write to this register begins a new triangle strip starting with the current vertex. No actual drawing is 
performed. 


Bit Description 
Begin New triangle 
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The Folowing two figures are sample pseudo code for generating triangle strips and fans. 


Setup Code 


// packed color triangle strip setup. 


write (sst->sSetupMode, PACKEDCOLOR | SETUP_XY | SETUP_RGB | SETUP_ALPHA | SETUP_ ST); 


// Begin triangle setup 

// Vertex #0 

write (sst->sVx, -30.0); 

write (sst->sVy, 15.0); 

write (sst->sARGB, OxFF010203); // Color 

write (sst->sSw, 4.0); 

write (sst->sTw, 2.0); 

write (sst->sBegintriCMD, 0); // Begin Triangle 


// vertex #1 

write (sst->sVx, 5.0); 

write (sst->sVy, 10.0); 

write (sst->sARGB, 0x00052377); 
write (sst->sSw, 30.0); 

write (sst->sTw, 60.0); 

write (sst->sDrawtriCMD, 0); 


// Vertex #2 

write (sst->sVx, 50.0); 

write (sst->sVy, 100.0); 

write (sst->sARGB, 0x12345678); 

write (sst->sSw, 100.0); 

write (sst->sTw, 200.0); 

write (sst->sDrawtriCMD, 0);// Draw first triangle 


// Vertex #3 

write (sst->sVx, 50.0); 

write (sst->sVy, 0.0); 

write (sst->sARGB, 0x87654321); 

write (sst->sSw, 0.0); 

write (sst->sTw, 200.0); 

write (sst->sDrawtriCMD, 0);// Draw second triangle 


// Vertex #4 

write (sst->sVx, 100.0); 

write (sst->sVy, 100.0); 

write (sst->sARGB, 0x0); 

write (sst->sSw, 200.0); 

write (sst->sTw, 150.0); 

write (sst->sDrawtriCMD, 0);// Draw second triangle 
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// Separate Color triangle fan setup 
write (sst->sSetupMode, FANMODE | SETUP_XY | SETUP_RGB); 


// Vertex #0 

write (sst->s Vx, -30.0); 

write (sst->sVy, 15.0); 

write (sst->sRed, 0.0); 

write (sst->sGreen, 0.0); 

write (sst->sBlue, 0.0); 

write (sst->sBegintriCMD, 0); // Begin Triangle 


// vertex #1 

write (sst->sVx, 5.0); 

write (sst->sVy, 10.0); 

write (sst->sRed, 255.0); 
write (sst->sGreen, 0.0); 

write (sst->sBlue, 0.0); 

write (sst->sDrawTriCMD, 0); 


// Vertex #2 

write (sst->sVx, 50.0); 

write (sst->sVy, 100.0); 

write (sst->sRed, 0.0); 

write (sst->sGreen, 255.0); 

write (sst->sBlue, 0.0); 

write (sst->sDrawTriCMD, 0); // Draw first triangle 


// Vertex #3 

write (sst->sVx, 50.0); 

write (sst->sVy, 0.0); 

write (sst->sRed, 0.0); 

write (sst->sGreen, 0.0); 

write (sst->sBlue, 255.0); 

write (sst->sDrawTriCMD, 0); // Draw second triangle 


// Vertex #4 

write (sst->sVx, 100.0); 

write (sst->sVy, 100.0); 

write (sst->sRed, 255.0); 

write (sst->sGreen, 255.0); 

write (sst->sBlue, 0.0); 

write (sst->sDrawTriCMD, 0); // Draw second triangle 


8.76 textureMode Register 


The textureMode register controls texture mapping functionality including perspective correction, texture 
filtering, texture clamping, and multiple texture blending. 


tpersp_st Enable perspective correction for S and T iterators (0=linear interploation of S,T, force 
W to 1.0, 1=perspective correct, S/W, T/W) 
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Texture minification filter (0=point-sampled, 1=bilinear) 
3 Clamp when W is negative (0=disabled, 1=force S=0, T=0 when W is negative) 

Enable Level-of-Detail dithering (O=no dither, 1=dither 

Clamp T Iterator (O=wrap, 1=clamp) 

P| exture Color Combine Unit control (RGB); 

Zero Other (0=c_other, 1=zero) 

Subtract Color Local (0=zero, 1=c_local) 


16:14 tc_mselect Mux Select (0=zero, 1=c_local, 
5=LOD frac, 


| 


1 Reverse Blend (O=normal blend, 1=reverse blend) 


ee ol Rood 
oo Wry 


Add Color Local 

Add Alpha Local 

Invert Output 

fe Texture Alpha Combine Unit control (A): 
[| tca_zero_other | 


tca_zero_other Zero Other (O=c_other, 1=zero) 


22 Subtract Color Local (0=zero, 1=c_local) 


25:23 | tca_mselect Mux Select (0=zero, 1=c_local, 
5=LOD_ frac, 
26 tca_reverse_blen Reverse Blend (0O=normal blend, 1=reverse blend) 
d 


— 
\o 


7 
| Invert Output 
| | Texture Alpha Combine Unit control (A); 
| Zero Other (0=c_other, I=zero) 
| 22 | | Subtract Color Local (0=zero, =e local) 


tpersp_st bit of textureMode enables perspective correction for S and T iterators. Note that there is no 
performance penalty for performing perspective corrected texture mapping. 


tminfilter, tmagfilter bits of textureMode specify the filtering operation to be performed. When point 
sampled filtering is selected, the texel specified by <s,t> is read from texture memory. When bilinear 
filtering is selected, the four closet texels to a given <s,t> are read from memory and blended together as a 
function of the fractional components of <s,t>. tminfilter is referenced when LOD>=LODmin, otherwise 
tmagfilter is referenced. 


tclampw bit of textureMode is used when projecting textures to avoid projecting behind the source of the 
projection. If this bit is set, S, T are each forced to zero when W is negative. Though usually desireable, it 
is not necessary to set this bit when doing projected textures. 


tloddither bit of textureMode enables Level-of-Detail (LOD) dither. Dithering the LOD calculation is 
useful when performing texture mipmapping to remove the LOD bands which can occur from with 
mipmapping without trilinear filtering. This adds an average of 3/8 (.375) to the LOD value and needs to 
compensated in the amount of Jodbias. 


tnecselect bit of textureMode selects the NCC lookup table to be used when decompressing 8-bit NCC 
textures. 
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tclamps, tclampt bits of textureMode enable clamping of the S and T texture iterators. When clamping is 
enabled, the S iterator is clamped to [0, texture width) and the T iterator is clamped to [0, texture height). 
When clamping is disabled, S coordinates outside of [0, texture width) are allowed to wrap into the [0, 
texture width) range using bit truncation. Similarly when clamping 1s disabled, T coordinates outside of [0, 
texture height) are allowed to wrap into the [0, texture height) range using bit truncation. 


tformat field of textureMode specifies the texture format accessed by TREX. Note that the texture format 
field is used for both reading and writing of texture memory. near aera 
ea enERGRENET OR 1 The eee: table aw the texture formats and how the texture data is 


expanded into 32-bit ARGB color 


t Value 


HE ERT POT OT OTTO 

ea a | 
70 

Pea mie aE REDE TERED 


8 bit Palette to RGBA {palette_1[7:2], {palette_1[1:0], {palette_g[3:0], {palette_b[5:0], 
palette_r[7:6] palette_g[7:4], palette_b[7:6], palette_b[5:4]} 
palette r[1:0]} palette _g[3:2]} 
a 
= 16-bit ARGB (8-3-3-2) a[7:0] {r[2:0],r[2:0].r[2:1]} | {g[2:0],g[2:0],g[2:1]} | {b[1:0],b[1:0],b[1:0],b[1:0]} 
ane eee EL 
[ob RGB 6-65) OEE 


eC a[0 
ae GRO {r[3:0} ,r[3:0]} g[3:0},g[3:0]} {b[3:0},b[3:0]} 


where a, r, g, b, and i(intensity) represent the actual values read from texture memory. The following table 
shows how 32-bit RGBA texture information is derived from the YIQ texture formats. This is detailed 
later in the nccTable description. 


Texture format 8-bit Alpha 8-bit Red 8-bit Green 8-bit Blue 
8-bit YIQ (4-2-2) nec _red[7:0] nec _blue[7:0] 
16-bit AYIQ (8-4-2-2 
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There are three Texture Color Combine Units eo and one Texture me Combine Unit(A). so 
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Avenger+ Datapath 
- Texture Color Combine Unit - 


mos 3 as 3 3s ae 
ons Os, moasag aos [ee] [ee] o-a 
Zs 2 ar) QO8O 2 Oo-n iS} a] 

es eg 00 Sata afd [a2 ia a 00 
22 vo M = oH oovoMs o o © & oh 
SS2583898656& S5EBEBQEBSS 3 B GEES 
RR ESM SES g28ee uw 5 gk SS aew 
SSse22 aR es BRLL29R EF 2 2 BBE 
sbaegees Ban zEEE 3 8 8288 
8829858 588 eS S855 Ee 3) a 5 bE 
SSS SRR EB 2sess258 Bot ok 288 


01234567 01234567 


tc_otherselect[2:0] tc_localselect[2:0] tc_mselect_7[2:0] 
c_other c_local c_mselect_7 
8 0.0.8 (format= {sign.int.frac}) 8 0.0.8 8 0.0.8 
: uo} 
5 2 
s 3 2 JS 
2) oo a) an : 2 ga 
a So! S26 oo ; al BS 5 
ear x (00) 0.0-x (00) =, 3 aa & 1 g £6, 
15 0.0-x (01) x (01) a. ggseo I 
24 o 2 Beos 1 
iF 0.ff-x (10) 0.ff-x (10) xe) 6 % & a 1 
| sam 1 
£ 2 Sa EA 
8 a|(S2e oe ! 
& a 1 
a! o!| ' 
ia 012345 6 Z+te mselect[2:0] 1 
8 0,0.8 ; 
8.0.0.8 / 7 
10 signed x Ne, alpha_inv ' 
9 unsigned I 
* i] 
multiply vy Unique forR,G,B 1 Combined in 
Trunc. LSBs ; common unit 
No Round a I 
10 1.1.8 = 1 
2 8 1 
£ ! 
ge | 
=e 
o8 5 9 0.1.8 H 
\0 13 {tc_add_clocal, tc_add_alocal} 
1 
tc_invert_add_local ; 
8 0.0.8 
11 1.2.8 
Modulate 1x, 2x, 4x;— tc_outshift[ 1:0] 
13 1.4.8 
Clamp 0-FF 
tc_invert_output 8 0.0.8 
! 8 0.0.8 
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Avenger+ Datapath 
- Texture Alpha Combine Unit - 


s 
os s 
a8 6 22 ¢s 
ea os Bet oe 
Sa oo = 

s U0 sa 
Pog wo ovo 
2 g 
se 86 se 83 
Eo 4 xR SY 
2oxegss O92 v's 
~~ 2é >. | 4 
SaaS Bons 
S838 Sea8 
ef es 2oe#s 


tca_otherselect[1:0] tca_localselect[1:0] 
a_other a_local 
8 0.0.8 8 0.0.8 
0 0 zs 
5 2 
6 & I g is 
lo 9 s ' 5 st 
ofS S 2 2s ira ! eA 
So ae) aa > & I g wa 
Ni zr x 0.0-x (00) = ee lates ae 1 £ 6A 
gs 0.0-x (01) x (Ol) Ss ge sea I 
= 2 0.ff-x (10) 0.ff-x (10) 2 ‘ BE aos a ! 
5 5 S33 BOES ! 
E z 0/8286 
| a | 1 
B § 
0123 45 6 7 tca mselect[2:0] i 
8 0,0.8 : 
80.0.8 ‘ i 
10 signed x NE alpha_inv 
9 unsigned I 
multiply J Combined in 
Trunc. LSBs ' common unit 
No Round a I 
10 1.1.8 = . f 
eo a l 
5s 
ge 
ob 9 0.1.8 i 
\O13 {tca_add_clocal, tca_add_alocal} 
i) 
tca_invert_add_local : 
8 0.0.8 
11 1.2.8 
Modulate 1x, 2x, 4x}-—tca_outshift[ 1:0] 
13 1.4.8 
Clamp 0-FF 
tca_invert_output 8 0.0.8 
8 0.0.8 
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8.77 tLOD Register 
The tLOD register controls the texture mapping LOD calculations. 


19 lod_tsplit Texture is Split. (0=texture contains all LOD levels, 1=odd or even levels only, as 
controlled by lod_odd) 
20 


lod_s_is wider | S dimension is wider, for rectilinear texture maps. This is a don’t care for square 
textures. (1=S is wider than T). 
22:21 | lod_aspect Aspect ratio. Equal to 2“n. (00 is square texture, others are rectilinear: 01 is 2x1/1x2, 
10 is 4x1/1x4, 10 is 8x1/1x8) 


23 lod_zerofrac LOD zero frac, useful for bilinear when even and odd levels are split across two 
PP [Bem | ieexe(omml Lob ime etrcetin eo) | 
Short swap incoming texture data (shorts 0<->1). 
(30 | big pC! 


lodbias is added to the calculated LOD value, then it is clamped to the range [/odmin, min(8.0, lodmax)]. 
Note that whether the LOD is clamped to /odmin is used to determine whether to use the minification or 
magnification filter, selected by the tminfilter and tmagfilter bits of textureMode: 


LOD bias, clam 


LODmin LODmax 


0 8 LOD 
256x256 1x1 
—__—————_O 
tmagfilter > 
tminfilter 
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LOD bias, clamp 
for big textures 


LODmin LODmax 
0 11 LOD 
2kx2k 1x1 
<—_———_-O 


tmagfilter —.——————__ 


tminfilter 


The tdata_swizzle and tdata_swap bits in tLOD are used to modify incoming texture data for endian 
dependencies. The tdata_swizzle bit causes incoming texture data bytes to be byte order reversed, such 
that bits(3 1:24) are swapped with bits(7:0), and bits(23:16) are swapped with bits(15:8). Short-word 
swapping is performed after byte order swizzling, and is selected by the tdata_swap bit in tLOD. When 
enabled, short-word swapping causes the post-swizzled 16-bit shorts to be order reversed, such that 
bits(31:16) are swapped with bits(15:0). The following diagram shows the data manipulation functions 
perfomed by the tdata_swizzle and tdata_swap bits: 
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Incoming Texture Data 


32 


tdata_swizzle 


(Shorts 0-1) 


Texture Memory Texture Memory 
Data [31:16] Data [15:0] 


8.78 tDetail Register 


The tDetail register controls the detail texture. 


RGB texture minification filter(0 = point-sampled, 1 = bilinear 


rgb_tmagfilter RGB texture magnification filter(0 = point-sampled, | = bilinear 


Alpha texture minification filter(0 = point-sampled, 1 = bilinear) 
pha texture magnification filter(0 = point-sampled, 1 = bilinear) 


21 rgb_a_separate_filte | 0 =tminfilter and tmagfilter of textureMode define the filter for RGBA 
r 1 =rgb_tminfilter and rgb_tmagfilter define the filter for RGB, 
a tminfilter and a tmagfilter define the filter for alpha. 


detail_factor is used in the Texture Combine Unit to blend between the main texture and the detail texture. 
detail factor (0.8 unsigned) = max(detail_max, ((detail_bias - LOD) << detail_scale)). 


When rgb_a_separate_filter is set, rgb_tminfilter and rgb_tmagfilter are used for RGB filtering and 
a_tminfilter and a_tmagfilter are used for A filtering. 
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8.79 texBaseAddr, texBaseAddr1, texBaseAddr2, and texBaseAddr38 Registers 


The texBaseAddr register specifies the starting texture memory address for accessing a texture. It is used 
for both rendering and texture downloading. Calculation of the texBaseAddr is described in the Texture 
Memory Access section. Selection of the base address is a function of tmultibaseaddr and LODBI. 


texBaseAddr[24:4], 0000}. If the texture is tiled (texBaseAddr[0]=1), then texBaseAddr[3 1:25] indicate 
the tile stride. 


texBaseAddr 
10 ~—s'| texmemtype Texture Memory type (0=linear, 1=tiled) 
| 


texbaseaddr Texture Memory Base Address, , in 16-byte units, tmultibaseaddr==0 or 
LODBI==0 


$195 Tile stride (0 to 127 tiles). 


texBaseAddr1, texBaseAddr2, texBaseAddr38 indicate the base addresses of lods 1, 2 and 3-8 in 16 byte 
units, if tmultibaseaddr=1. 


texBaseAddr1, texBaseAddr2, texBaseAddr38 
| | Texbaseaddrl | Texture Memory Base Address, tmultibaseaddr==| and LODBI==1 
| | texbaseaddr2 Texture Memory Base Address, tmultibaseaddr==| and LODBI== 
| | texbaseaddr38 | Texture Memory Base Address, tmultibaseaddr==1 and LODBI>=3 


8.80 trexInit1 Register 


The trexInit1 register is used for hardware initialization and configuration of the TREX portion of 
Napalm. 


|O_ | rsv_sl_int slave reserved 
6:2 ft_FIFO_sil FBI-to-TREX interface FIFO stall input level. Free space level at which stall 
signal is sent back to transmitting chip. 


10:7 tt FIFO_sil TREX-to-TREX interface FIFO stall input level. Free space level at which stall 
signal is sent back to transmitting chip. 


15:12 | tf ck del adj TREX-to-FBI interface clock delay adjust. Adjusts phase of the transmit clock. 


16 rg_ttcii_inh Register ttcii inhibit. when use_rg_ttcii_inh==1. 0=expect data from upstream 
TREX, 1=ignore data from upstream TREX. 
17 


use_rg_ttcii_inh Use register ttcii inhibit to chose if data is expected from upstream TREX. 
0=use clock sense result, 1=ignore clock sense result and use rg_ttcii_inh. 


18 send_config Send config. Transmit configuration to FBI through the tf_ interface instead of 
texel data. O=normal, 1=send. 
reset FIFOs Reset all of the FIFO’s inside TREX. O=run, 1=assert the reset signal. 
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Reset all of the graphics inside TREX. O=run, 1=assert the reset sig 
22:21 reserved 


send_config_sel Send config select. (not revision 0) Selects which data to transmit to FBI when 
send_config==1. 
000=reserved 
001=reserved 


010=reserved 


011=trexInit1, 
100=texBaseAddr[31:0], (for this function, 32 bits are retained and is non-maskable) 
101,110,111=reserved. 

use 4bit_st_frac 1=use 4 bits for s,t instead of 8. Default = 0. 

a_attr_set_only 1=use only the A set of triangle attributes. Default = 0. 

nop_per_tri 1=insert a nop per triangle. Default = 0. 

always_cache_inv 1=always cache invalidate each triangle. Default = 0. 

always 4texel needed _| \1=always indicate that 4 texels are needed for each pixel. Default = 0. 


send_config 

It is possible to read trexInit1 and texBaseAddr through the the send_config path, which sends these 
registers over to the FBI section of Napalm via the graphics tf bus. When send_config = 1, 
tf_data[31:0] = {a[7:0], r[7:0], g[7:0], b[7:0]}. TREX’s TC/TCA must be set to pass c_other. 


8.81 necTable0 and nccTablel Registers 


The necTable0 and necTablel registers contain two Narrow Channel Compression (NCC) tables used to 
store lookup values for compressed textures (used in YIQ and AYIQ texture formats as specified in tformat 
of textureMode). Two tables are stored so that they can be swapped on a per-triangle basis when 
performing multi-pass rendering, thus avoiding a new download of the table. Use of either necTable0 or 
necTablel is selected by the Narrow Channel Compressed (NCC) Table Select bit of textureMode. 
necTable0 and necTablel are stored in the format of the table below, and are write only. 


pO 3120 | £3 (7:0), ¥2[7:0], YIL7:0], YOL7-O}} 
pt 3120 | 77:0], YO[7:0], Y5[7:0], Y4[7-0]} 
pt 3120 | f¥L7:0], YaL7:0], Y9[7:0], Y8[7:0]} 
pS 3120 | f¥A7:0), Ve[7:0], YaL7:0}, Ye[7:0]} 
| | 26:0 | {10_r[8:0], 10_g[8:0]. 10 b[8:0]} 


PS 26:0 | fT rf8:0], 1 gf8:0], 1 bf8:0]} 
6 26:0 | $12 1f8:0], 12 gf 8:0], 12 bf8:0]} 
{13_r[8:0], 13_g[8:0], 13_b[8:0]} 
| 8 26:0 | 4Q0 rf8:0], QO_g[8:0], QO bf8:0]} | 
| | 26:0 | {QU 18:0], Q1_gf8:0], QI b[8:0] 


The following figure illustrates how compressed textures are decompressed using the NCC tables: 
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necTable register 
Select 


From Memory Data Alignment 


9 Blu 


qi 


Clamp 0-FF 
8 


A 8 Blue 


i 


8.82 8-bit Palette 

The 8-bit palette is used for 8-bit P and 16-bit AP modes. The palette is loaded with register writes. 
During rendering, four texels are looked up simultaneously, each an independent 8-bit address. 
Palette Write 


The palette is written through the NCC table 0 I and Q register space when the MSB of the register write 
data is set. The NCC tables are not written when the I or Q NCC table register space is addressed and MSB 
of the register write data is set to | — Instead the data is stored in the texture palette. 
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Palette Load Mechanism 


Register 
Address LSB of P Register Write Data 


31 0 


Note that the even addresses alias to the same location, as well as the odd ones. It is recommended that 
the table be written as 32 sets of 8 so that PCI bursts can be 8 transfers long. 


necTable0 10 P[0]=0 


8.83 Command Descriptions 


8.83.1 NOP Command 


The NOP command is used to flush the graphics pipeline. When a NOP command is executed, all pending 
commands and writes to the texture and frame buffers are flushed and completed, and the graphics engine 
returns to its IDLE state. While this command is used primarily for debugging and verification purposes, it 
is also used to clear the 3D status registers (fbiTriangles, fbiPixelsIn, fbiPixelsOut, fbiChromaFail, 
fbiZfuncFail, and fbiAfuncFail). Setting nopCMD bit(0)=1 clears the 3D status registers and flushes the 
graphics pipeline, while setting nopCMD bit(0)=0 has no affect on the 3D status registers but flushes the 
graphics pipeline. See the description of the nopCMD register in section 5 for more information. 


8.83.2. TRIANGLE Command 
TO BE COMPLETED. 
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8.83.3 FASTFILL Command 


The FASTFILL command is used for screen clears. When the FASTFILL command is executed, the 
depth-buffer comparison, alpha test, alpha blending, and all other special effects are bypassed and disabled. 
Prior to executing the FASTFILL command, the clipLeftRight and clipTopBottom registers must be 
loaded with a rectanglar area which is desired to be cleared -- -- the fastfillCMD register is then written to 
initiate the FASTFILL command. Note that clip registers define a rectangular area which is inclusive of 
the clipLeft and clipTop register values, but exclusive of the clipRight and clipBottom register values. 
Note also that the relative position of the Y origin (either top of bottom of the screen) is defined by 
fbzMode bit(17), and that fozMode bits(15:14) determine which RGB buffer (front or back) is written. 


8.83.4 SWAPBUFFER Command 


The SWAPBUFFER command is used to swap the drawing buffers to enable smooth animation. Since the 
SWAPBUFFER command is executed and queued like all other 2D and 3D commands, proper order is 
maintained and software does not have to poll and wait for vertical retrace to manually swap buffers — this 
frees the CPU to perform other functions while the graphics engine automatically waits for vertical retrace. 
When the SWAPBUFFER command is executed, swapbufferCMD bit(0) determines whether the drawing 
buffer swapping is synchronized with vertical retrace. Typically, it is desired that buffer swapping be 
synchronized with vertical retrace to eliminate frame “tearing” typically found on single buffered displays. 
If vertical retrace synchronization is enabled for double buffered applications, the graphics command 
processor blocks on a SWAPBUFFER command until the monitor vertical retrace signal is active. If the 
number of vertical retraces seen exceeds the value stored in bits(8:1) of swapbufferCMD, then the pointer 
used by the monitor refresh control logic is changed to point to another drawing buffer. If vertical retrace 
synchronization is enabled for triple buffered applications, the graphics processor does not block on a 
SWAPBUFFER command. Instead, a flag is set in the monitor refresh control logic that automatically 
causes the data pointer to be modified in the monitor refresh control logic during the next active vertical 
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retrace period. Using triple buffering allows rendering operations to occur without waiting for the vertical 
retrace active period. 


The swapbufferCMD must be proceeded by a direct write of the swapPend register. A write to the 
swapPend register increments the swap buffers pending field in the status registers. Conversely, when an 
actual frame buffer swapping occurs, the swap buffers pending field in the status register is decremented. 
The swap buffers pending field allows software to determine how many SWAPBUFFER commands are 
present in the Napalm FIFOs. See the descript of the swapbufferCMD register in section 5 for more 
information. 


Since Napalm does not have fixed color buffer locations, 2 new registers are required for buffer display. 
LeftOverlayBuf and rightOverlayBuf are used by the video scanout section to determine the location of 
the current display buffer. The sequence of writes for double buffering would consist of writing to the 
leftOverlayBuf register and optionally the rightOverlayBuf (for stereo operations), followed by a direct 
write of swapPend, ending with a write to swapbufferCMD register. The leftOverlayBuf and 
rightOverlayBuf registers are fifoed, allowing tripple and quad buffering. 


8.83.5 USERINTERRUPT Command 


The USERINTERRUPT command allows for software-generated interrupts. A USERINTERRUPT 
command is generated by writing to the userIntrCMD register. userIntrCMD bit(0) controls whether a 
write to userIntrCMD generates a USERINTERRUPT. Setting userIntrCMD bit(0)=1 generates a 
USERINTERRUPT. userIntrCMD bit(1) determines whether the graphics engine stalls on software 
clearing of the user interrupt. By setting userIntrCMD bit(1)=1, the graphics engine stalls until the 
USERINTERRUPT is cleared. Alternatively, setting userIntrCMD bit(1)=0 does not stall the graphics 
engine upon execution of the USERINTERRUPT command, and additional graphics commands are 
processed without waiting for clearing of the user interrupt. A identification, or Tag, is also associated 
with an individual USERINTERRUPT command, and is specified by writing an 8-bit value to 
userIntrCMD bits(9:2). 


User interrupts must be enabled before writes to the userIntrCMD are allowed by setting intrCtrl 
bit(5)=1. Writes to userIntrCMD when intrCtrl bit(S)=0 are “dropped” and do not affect functionality. A 
user interrupt is detected by reading intrCtrl bit (11), and is cleared by setting intrCtrl bit(11)=0. The tag 
of a generated user interrupt is read from intrCtrl bits (19:12). See the description of the intrCtrl and 
userIntrCMD registers in section 5 for more information. 


8.84 Linear Frame Buffer Access (* FIX THIS *) 


The Napalm linear frame buffer base address is located at a 8 Mbyte offset from the memBaseAddr PCI 
configuration register and occupies 4 Mbytes of Napalm address space (see section 4 for an Napalm 
address map). Regardless of actual frame buffer resolution, all linear frame buffer accesses assume a 2048- 
pixel logical scan line width. The number of bytes per scan line depends on the format of linear frame 
buffer access format selected in the IfbMode register. Note for all accesses to the linear frame buffer, the 
status of bit(16) of fbzMode is used to determine the Y origin of data accesses. When bit(16)=0, offset 
0x0 into the linear frame buffer address space is assumed to point to the upper-left corner of the screen. 
When bit(16)=1, offset 0x0 into the linear frame buffer address space is assumed to point to the bottom-left 
corner of the screen. Regardless of the status of fbzMode bit(16), linear frame buffer addresses increment 
as accesses are performed going from left-to-right across the screen. Also note that clipping is not 
automatically performed on linear frame buffer writes if scissor clipping is not explicitly enabled 
(fbzMode bit(0)=1). Linear frame buffer writes to areas outside of the monitor resolution when clipping is 
disabled result in undefined behavior. 
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8.84.1 Linear frame buffer Writes 


The following table shows the supported linear frame buffer write formats as specified in bits(3:0) of 
Ifb Mode: 


Linear Frame Buffer Access Format 
16-bit formats 
16-bit RGB (5-6-5) 
16-bit RGB (x-5-5-5) 
16-bit ARGB (1-5-5-5) 
Reserved 


32-bit formats 


14. 24 bit RGB (8-8- me 
|S | 32S ARGB (8-888) 


When writing to the linear frame buffer with a 16-bit access format (formats 0-3 and format 15 in 

Ifb Mode), each pixel written is 16-bits, so there are 2048 bytes per logical scan line. Remember when 
utilizing 16-bit access formats, two 16-bit values can be packed in a single 32-bit linear frame buffer write 
-- the location of each 16-bit component in screen space is defined by bit(11) of IfbMode. When using 16- 
bit linear frame buffer write formats 0-3, the depth components associated with each pixel is taken from the 
zaColor register. When using 16-bit format 3, the alpha component associated with each pixel is taken 
from the 16-bit data transfered, but when using 16-bit formats 0-2 the alpha component associated with 
each pixel is taken from the zaColor register. The format of the individual color channels within a 16-bit 
pixel is defined by the RGB channel format field in IfbMode bits(12:9). See the IfbMode description in 
section 5 for a detailed description of the rgb channel format field. 


When writing to the linear frame buffer with 32-bit access formats 4 or 5, each pixel is 32-bits, so there are 
4096 bytes per logical scan line. Note that when utilizing 32-bit access formats, only a single pixel may be 
written per 32-bit linear frame buffer write. Also note that linear frame buffer writes using format 4 (24-bit 
RGB (8-8-8)), while 24-bit pixels, must be aligned to a 32-bit (doubleword) boundary -- packed 24-bit 
linear frame buffer writes are not supported by Napalm. When using 32-bit linear frame buffer write 
formats 4-5, the depth components associated with each pixel is taken from the zaColor register. When 
using format 4, the alpha component associated with each pixel is taken from the zaColor register, but 
when using format 5 the alpha component associated with each pixel is taken from the 32-bit data 
transfered. The format of the individual color channels within a 24/32-bit pixel is defined by the rgb 
channel format field in IfbMode bits(12:9). 


When writing to the linear frame buffer with a 32-bit access formats 12-14, each pixel is 32-bits, so there 
are 4096 bytes per logical scan line. Note that when utilizing 32-bit access formats, only a single pixel 
may be written per 32-bit linear frame buffer write. If depth or alpha information is not transfered with the 
pixel, then the depth/alpha information is taken from the zaColor register. The format of the individual 
color channels within a 24/32-bit pixel is defined by the rgb channel format field in IfbMode bits(12:9). 
The location of each 16-bit component of formats 12-15 in screen space is defined by bit(11) of IfbMode. 
See the IfbMode description in section 5 for more information about linear frame buffer writes. 


Copyright © 1996-1999 3dfx Interactive, Inc. Revision 1.13 
3dfx Confidential 158 Printed 
10/24/2019 


For Internal Use Only 


3 " Napalm Graphics Engine 
8.84.2 Linear frame buffer Reads 


It is important to note that reads from the linear frame buffer bypass the PCI host FIFO (as well as the 
memory FIFO if enabled) but are blocking. If the host FIFO has numerous commands queued, then the 
read can potentially take a very long time before data is returned, as data is not read from the frame buffer 
until the PCI host FIFO is empty and the graphics pixel pipeline has been flushed. One way to minimize 
linear frame buffer read latency is to guarantee that the Napalm graphics engine is idle and the host FIFOs 
are empty (in the status register) before attempting to read from the linear frame buffer. 


9. 1. PLL Registers 


Phase Charge Clock Out 
Dector Pump 


Register Name Description 
pllCtrl0 0x40-0x43 Video Clock PLL 
pllCtrll 0x44-0x47 GRX Clock PLL 


Genlock mode: in order for the register 3da (vga register) to reflect the status of vsync correct, vgaInit0[1] 


needs to be set 


9.1 PliCtrl0, PllCtri1 registers 
These registers control the frequency of the core clock (GRX Clock) and the Video Clock. 


-Bit _____ 
"Post divider value 
M PLL input divider 


N, PLL multiplier 
Test. 


Frequency output of PLL’s is given: 
fout = 14.31818 * (N+ 2)/(M+2)/(2*%K). 
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NOTE: if the deviceID==4, then the GRX clock pll’s M[7:2] value is forced to 0x18 (24 decimal), which 
limits the maximum frequency of the grx_clk to 141MHz. Software must adjust calculations for setting the 
frequency accordingly. 


So S1 S2 


TestCLKO 


CLK532 


PLL_EN |S Ss S2__| CLK66 CLK266 CLK532 Mode 
0 0 0 0 0 0 0 PLL Disable 
X 1 1 1 TestCLKk2 TestCLK1 TestCLK0O | PLL Bypass mode 
1 0 0 0 Ref Ref*4 Ref*8 Close loop 
1 1 0 0 TestCLK0/3 | TestCLK0/2 | TestCLK0O 
9.3 Test 
Mode 


PLL LL EN reel orks nop or CLK an LKB ne 
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10. 2. DAC Registers 


Register Name Vo Bits Description 
Address 


Ox4c-Ox4f [4:0 | R/W_| Dac Mode 2:1 or 1:1 


dacAddr 0x50-0x53 ro Tw] Dac pallette address 
0x54-0x57 
Ox58-Ox5b [na | PT 


10.1. 2.1 dacMode 


Description 
Dac Mode 2:1 or 1:1 
Enable DPMS on Vsync 


Force Vsync value. 
Enable DPMS on Hsync 
Force Hsync value. 


10.2 2.2 dacAddr 


[8:0 sd ~Palilette Address —C—“‘“Cs*s*s*s—“‘“‘“‘SNCCCC*dz 


Pallette Address 


This is the 9 bit CLUT address used for programming the CLUT. Unlike the VGA mechanism, this address 
does not auto increment, but has access to the entire 512 entries in the CLUT. 


10.3 2.3 dacData 


[23:0 Ss | Daccolorvalue CC‘ WCOC™— CY 


Dac color value 


This is the 24 bit RGB value at the index programmed into dacAddr. The color values are always stored 
with red in bits [23:16], green in bits [15:8] and blue in bits [7:0]. 
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Video a’ 


Register Name Vo Bits Description 
Addr 


a [RIW_| Cursor Patom Address] 
Video In Format 
-vdSeta PralePon [ox [30 RW Serial and Parl Ports 


vidInXDecimDeltas Video In horizontal decimation delta 1 & 
2. 

vidInDecimInitErrs 0x80 31:0 R/W_| Video In horizontal and vertical 
decimation initial error term 


7: | 17:0_| R/W_| Video Pixel Buffer Threshold 
P31:0_| RAW | Chroma Key maximum value______| 
FidinsiatusCumeniCine [0x94 | 18:0 R—| Video in Status and Curent San Tne — 
[23:0 | RAW | Screen resolution 
Start Surface Coordinates [31:28] 

Overlay Start Screen Coordinates 
| 23:0_| R/W_| Overlay End Screen Coordinates ____| 


vidOverlayDudxOffsetSrcWidth | Oxa8 31: 0 R/W_ | Overlay horizontal magnification factor 
initial offset (bit 18:0) 
Overlay source surface width (bit 31:19) 


vidOverlayDvdy | 19:0 | Overlay vertical magnification factor 


vidOverlayDvdyOffset Oxe0 18:0 | R/W_ | Overlay vertical magnification factor 
initial offset 


vidDesktopStartA ddr | Oxed | 25:0 | R/W | Desktop start address 


vidDesktopOverlayStride Desktop and Overlay strides (linear or 
tile) 


| vidinAddrO | xe | 25:0_| R/W_| Video In Buffer 0 starting address | 
[vidinAddrl Oxf | 25:0 | R/W_| Video In Buffer | starting address 
| vidInAddr2 sd Oxf4 | 25:0 | R/W_| Video In Buffer 2 starting address 
| vidCurrOverlayStartAddr_____| Oxfe__| 25:0 | R__| Current overlay start address in use ___| 


11.1.1 3.1.1. vidTvOutBlankVCount 


If TV Out Genlock is enabled (VidInFormat[16] == 1’b1, VidInFormat[18]== 1’b1), vertical blank_n 
signal is de-asserted when the number of positive edges of tv_hsync after the positive edge of tv_vsynce == 
vidTvOutBlankV Count bits[10:0]. 


Vertical blank_n signal is re-asserted when the number of positive edges of tv_hsync after the positive edge 
of tv_vsyne == vidTvOutBlank VCount bits[26:16]. 
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Output blank_n == horizontal blank_n AND vertical blank_n. 


Note that the value in bits[26:16] needs to be greater than bits[ 10:0]. The clock cycles are based on the 
clock coming in through the tv_inclk pin. 


Description 


The number of tv_hsync LEADING edges after the LEADING edge of tv_vsync before the 
vertical active region starts (i.e., vertical blank becomes deasserted). 


Reserved 

The number of tv_hsync LEADING edges after the LEADING edge of tv_vsync before the 
vertical active region ends (i.e., vertical blank is re-asserted). 

reserved 


11.1.2 3.1.2. vidMaxRgbDelta 


The vidMaxRgbDelta register specifies the threshold values allowed for the deviation of a pixel’s luma and 
chroma components when the pixel is used in the video filter (4x1 tap filter or 2x2 box filter). If the 
absolute deviation of a pixel component exceeds its threshold, the particular component will be replaced by 
that of the center pixel before the pixel is used in the filter. 


Maximum blue/V/Cr delta for video filtering (unsigned) 


Maximum green/U/Cb delta for video filtering (unsigned) 


21:16 Maximum red/Y delta for video filtering (unsigned) 


11.1.3 3.1.3 vidProcCfg Register 


The vidProcCfg register is the general configuration register for the Video Processor. It is written by the 
host upon reset only. 


|0 CT: Video Processor on, VGA mode off; 0: Video Processor off, VGA mode on. 


Half mode. 0 = disabled. | = enabled where desktop stride is added every other lines. 


ChromaKeyEnable. 0 = off. 1 = on. 


ChromaKeyResultInversion: (0 = if desktop color matches or falls 
within the chroma-key color range; 1 = if desktop color does not 
match or fall within the chroma-key range). 


Video-in data displayed as overlay enable. 0 = do not display the video-in buffer directly 
as overlay. 1= use the video-in buffer address as the overlay start address (auto-flipping). 
Desktop clut bypass. 0 = do not bypass the clut in the RAMDAC, 1 = bypass the clut 


Overlay clut bypass. 0 = do not bypass the clut in the RAMDAC, | = bypass the clut 
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Desktop clut select. 0 = use the lower 256 entries of the clut. | = use the upper 256 
entries. 

Overlay clut select. 0 = use the lower 256 entries of the clut. 1 = use the upper 256 
entries. 

Overlay horizontal scaling enable. 0=disabled. 1=enabled. 
Magnification factor determined by vidOverlayDudx. 
Overlay vertical scaling enable. 0=disabled. 1=enabled. 
Magnification factor determined by vidOverlayDvdy. 
Overlay filter mode 

00: point sampling 

01: 2x2 dither subtract followed by 2x2 box filter (for 3d only) 
10: 4x4 dither subtract followed by 4x1 tap filter (for 3d only) 
11: bilinear scaling 

Desktop pixel format 

000: 8bit palettized 


001: RGB565 undithered 


010: RGB24 packed 
011: RGB32 


HL 


101: Reserved 

110: Reserved 

111: Reserved 

Overlay pixel format 
000: 

001: RGB565 undithered 
010: 

O11: 

100: YUV411 

101: YUYV422 

110: UYVY422 

111: RGB565 dithered 


Backend deinterlacing for video overlay. 0 = No deinterlacing in the backend pipe. | = 
Backend deinterlacing (Bob method). Bob method displays either the even or odd frame 
at a time, and interpolates two interlaced lines to get the missing field. It is not supported 
in 2X mode. 


How to program for Backend deinterlacing (Bob method): 

The only thing this option effects is that when the video processor displays the even field, it adds 0.5 to the 
initial vertical offset (initial dvdy offset) used by the backend bilinear scaler. Everything else is the same. 
Since deinterlacing in the backend uses the bilinear scaler unit to interpolate between two interlaced lines, 
the host needs to enable bilinear filtering, overlay vertical scaling, overlay horizontal scaling, and set up the 
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initial dvdy offset, dvdy, initial dudx offset and dudx correctly according to the desired magnification 
factor between the source video and displayed video. The suggested setting for the parameters for backend 
deinterlacing without horizontal magnification are: bilinear filter enable = 1, overlay vertical scaling enable 
= I, overlay horizontal scaling enable = 0, initial dvdy offset = 0.25, dvdy = 0.5. Initial dudx offset and 
dudx are don’t cares. 

Backend deinterlacing is not supported for 2x mode (2-pixel per video clk mode) since bilinear filtering is 
not available in 2x mode. 


How to program for stereo video display: 

The mainstream way of stereo support is alternating field display with quad buffers. The host writes the left 
overlay start address into leftOverlayBuf register and the right address into rightOverlayBuf register. These 
registers are described in the 3D section of the spec. Then the host executes swap buffer command, and the 
pair of addresses will be pushed into the overlay start address fifo between the 3D and the video. Video 
will switch between the two addresses for screen refresh. When the host is ready to provide the next pair of 
right/left buffer addresses, it executes swap buffer command again, and the new pair of addresses will be 
pushed into the address fifo. Only at vertical retrace and the completion of refreshing the right frame will 
video use the new pair of overlay addresses. The stereo_out pin will indicate the left/right field of the frame 
being displayed. Also, when stereo display is enabled, the swap buffer command needs to be executed with 
sync to vsync enabled (mid-frame swap disabled). Mid-frame swap and stereo video display are mutually 
exclusive features. 


When dual buffer is used, there are two different options, and each has its own drawback. First, the host 
enables stereo video display, and writes both overlay buffer addresses into the leftOverlayBuf and 
rightOverlayBuf registers, and execute swap buffer command once only. In this case, video will continue 
to switch between the two addresses regardless of whether the next frame is ready or not. Stereo_out will 
indicate the left/right field. 

The other method is to turn off stereo, and video will look at only the leftOverlayBuf register. In this case, 
the host executes a swap buffer command only when it is done rendering the next frame and has written the 
buffer address to the leftOverlayBuf register. Therefore, video may display in time-sequence: left -> left -> 
right -> left frames depending on how long a frame takes to render. However, since stereo is disabled, 
stereo_out will remain 

low all the time. 


The stereo_out pin is used to control the shutter of stereo glasses. Another alternative supported by 
StereoGraphics’ Simuleyes is to use a white strip displayed at the bottom of the monitor to control the 
shuttering of the glasses. Depending on the length of the white strip (1/4 or % of the screen width), the 
glasses detect if the screen is displaying the right or left field. 


Finally for alternate line display for Head Mount Displays. The requirement is left field on scanline 0 and 
right field on scanline | and so on in the display. Napalm video does not have support for this. 


11.1.4 3.1.4 hwCurPatAddr Register 


The hwCurPatAddr register stores the starting address of two monochrome cursor patterns. Each pattern 
is a bitmap of 64-bit wide and 64-bit high (a total of 8192 bits). The two patterns are stored in such a way 
that pattern 0 always resides in the lower half (least significant 64-bit) of a 128-bit word and pattern | the 
upper half. In other words, each 128-bit word consists of one line from pattern 0 and one line from pattern 
1. At each horizontal retrace, the Video Processor checks to see whether the cursor location falls on the 
current scanline. If so, it fetches from the memory eight words of cursor patterns at a time. The eight words 
are then stored in the on-chip ram for use in the next eight scanlines. This reduces the number of memory 
accesses for cursor patterns from 64 to 8 times per screen refresh. Cursor patterns always reside in linear 
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address space, and the linear stride is always 16 bytes. The video processor figures out the shape and color 
of the cursor for the current scanline according to the following table: 


Bit from Pattern | Bit from Pattern | Displayed cursor Displayed cursor 
0 1 (Microsoft window) (X11) 


pO WC ur Current Screen Color 


POU Cr Current Screen Color 
pi Current Screencolor | HWCurCo 
NOT current screen color HWCurCl 


Bit Description 
i | Physical address of where the cursor pattern resides in the memo 


11.1.5 3.1.5  hwCurLoc Register 


The hwCurLoc register stores the x and y coordinates of the bottom right corner of the cursor. The 
coordinates are unsigned, and range from 0 to 2047. This allows a partial cursor to be displayed in all 
edges of the screen. 


X coordinate of the bottom right corner of the cursor. Undefined upon reset. 
26:16 Y coordinate of the bottom right corner of the cursor. Undefined upon reset. 


11.1.6 3.1.6 hwCurC0 Register 


The hwCurC0 register stores color 0 of the cursor. 


11.1.7 3.1.7. hwCurC!l Register 
The hwCurCl register stores color 1 of the cursor. 


11.1.8 3.1.8 vidInFormat 
The VidInFormat register allows the host to specify the data format of the video-in and tv-out data. 


po Pept 
(VMI only) Video-In data format 
110: 8bit YCbCr 4:2:2 (UYVY) 101: 8bit YCbCr 4:2:2 (YUYV) 
100: 8bit YCbCr 4:1:1 
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(VMI only) Video-In de-interlacing mode. (0 = No deinterlacing applied to the video data coming 
in; | = Weave method deinterlacing, i.e. the video-in port will merge two consecutive VMI frames 
into one inside the frame buffer before signaling a frame is done in the vidStatus register.) 


(TV out only) G4 for posedge (1=Brooktree TV out support; 0=Chrontel) 
1: Brooktree TV encoder samples at falling edge for the following data; 0: Chrontel TV encoder 
samples at rising edge for the following data 

Data[11] GO0[4] 

Data[10] G0[3] 

Data[9] GO0[2] 

Data[8] BO[7] 

Data[7] BO[6] 

Data[6] BO[5] 

Data[5] BO[4] 

Data[4] BO[3] 

Data[3] GO0[0] 

Data[2] BO[2] 

Data[1] BO[1] 

Data[0] BO[0] 

1: Brooktree TV encoder samples at rising edge for the following data; 0: Chrontel TV encoder 
samples at falling edge for the following data 

Data[11] RO[7] 

Data[10] RO[6] 

Data[9] RO[5] 

Data[8] RO[4] 

Data[7] RO[3] 

Data[6] GO[7] 

Data[5] GO0[6] 

Data[4] GO[5] 

Data[3] RO[2] 

Data[2] RO[1] 

Data[1] RO[0] 

Data[0] GO[1] 


VMI interface enable. 

Genlock enable. 

The VMI logic of the video controller uses vmi_pixclk_in as its clock. By setting bit 16 to 1, it 
allows Napalm to genlock to the clock of an external VMI device or TV encoder. 

0: The remaining video logic uses a separate video clock from the on-chip PLL. For VMI and TV 
encoder slave mode. 

1: The remaining video logic uses the genlock source (as selected by vidInFormat bit[18]) to drive 
its clock. If the genlock source is VMI, Napalm is genlocked to vmi_pixclk_in. If the genlock 
source is TV encoder, Napalm is genlocked to tv_inclk. For TV encoder master mode. 

(VMI/TV out) not_use_vga timing signal (Timing signals include vert_exra, display_ena, 
vfrontporch active, vbackporch active, vblank, vg 
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0: Use the timing signals supplied by the VGA. For VMI and TV out slave mode. 

1: Do not use the timing signals from the VGA. Timing signals are either supplied by the genlock 
source (as selected by vidInFormat bit[18]) or generated internally by the video controller. If 
genlock source is VMI, vmi_vsync and vmi_hsync are input from the VMI device (Vmi_vactive 


is always an input to Napalm from the VMI device). If the genlock source is TV encoder, 
tvout_vsync and tvout_hsync are input from the TV encoder (tvout_blank is always an output 
from Napalm to the TV encoder device). 

Genlock source select. (O=VMI (default), 1=TV encoder in master mode) 

0: The inputs vmi_pixclk_in, vmi_vsync, and vmi_hsync are used to drive 


TvOut select display_ena. If on, selects display_ena instead of vga_blank for driving output 
video. (0O=off; 1=on). 
Video-In horizontal decimation enable. (O=off; 1=on) 
Video-In vertical decimation enable. (0=off; 1=on) 
23 
[24 | 
Reserved 


The following configurations of external VMI and/or TV encoder devices are supported by Napalm — 


VidInFormat[16] | VidInFormat[18] VidInFormat| 17] 
TV encoder master 1 (Tv encoder) 
TV encoder slave Don’t care 


VMI genlock 0 (VMI) 

VMI slave* Don’t care 

TV encoder master + VMI slave 1 (Tv encoder) 

TV encoder slave + VMI genlock* 0 (VMI) 

* While it’s possible to configure the VMI device as master mode (1.e., genlock is enabled, VMI is the 
genlock source, and not_use_vga timing signal==1), it probably doesn’t make sense to do so, because one 


pixel of input data from the VMI device requires two clocks whereas one pixel of output data to the 
monitor or TV encoder requires only one clock, so the timing of the input and output devices can’t be 
aligned. 


VMI field detection 


Note that the polarity of the VMI Vsync, Hsync, and Vactive signals is programmable. The inactive going 
edge of the Vsync signal indicated whether the field is odd or even. If Hsync is active during the inactive 
going edge of Vsync, the field is even. If Hsync is inactive, the field is odd. 


11.1.9 3.1.9 vidSerialParallelPort Register 


The vidSerialParallelPort register controls the chip’s I2C, DDC, GPIO, and the host port interface. Since 
VMI and ROM share pins for their interface, a pin can be input or output depending on which interface has 
control of the pin at that time. GPIO[0] is a hardwired output pin designed to be an output enable of the on- 
board tristate drivers. GPIO[0] is asserted low, when the VMI device has control of the shared pins, and is 
driving pixdata[7:0], vmi_rdy_n, and vmi_intreq_n as input to Napalm. GPIO[0] is pulled high, when 
ROM controls the shared pins, and pixdata[7:0], vmi_rdy_n, and vmi_intreq_n are output of Napalm. 
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GPIO[1] is software programmable, and is used to control the output enable of the on-board tristate drivers 
for vmi_pixclk, vmi_vsync, vmi_hsync, and vmi_vactive. These are the signals that should be continually 
driven by the external vmi device even when the ROM is using the shared pins (ROM does not use the 
vmi_pixclk, vmi_vsync, vmi_hsync, and vmi_vactive pins). Otherwise the internal state of the vmi 
controller in Napalm may be messed up. Vmi_cs_n cannot be used in lieu of GPIO[1] for this purpose 
because the chip select pin can be turned off by vmi parallel host interface enable bit (bit 0 below). 


[0 | host___| VMI paralllel host interface enable. (0=off, 1=on); Default to upon reset. 
Pp | Mode A Mode BO 


host MI RW_N (Read/ Write_n) VMI WR_N (Write) 


VMI MI data DTACK_N (Data VMI data RDY (Data Ready) 
cknowledge) 


host MI Data output enable _n. (0 = enabled, 1 = disabled); Default to 1 upon reset. 
host/VMI MI Data (Input / Output) 
host MI Address 
DC interface 
8 host DC port enable (0 = disabled, 1 = enabled) Default to 0 upon reset. 
9 host DDC DCK write (0 = DCK pin is driven low, 1= DCK pin is tri-stated) 
When this pin is tri-stated, other devices can drive this line, and the final state of the 
pin is reflected in bit 26. Default to | upon reset. 
DDC DDA write (0 = DDA pin is driven low, |= DDA pin is tri-stated) 
When this pin is tri-stated, other devices can drive this line, and the final state of the 
pin is reflected in bit 27. Default to 1 upon reset. 
Monitor DDC DCK state (read only, 0 = low, 1 = tri-stated which means no device is driving 
this pin) 
Monitor DDC DDA state (read only, 0 = low, | = tri-stated which means no device is driving 
this pin) 
I2C SCK write (0 = SCK pin is driven low, 1= SCK pin is tri-stated) 
When this pin is tri-stated, other devices can drive this line, and the final state of the 
pin is reflected in bit 21. Default to 1 upon reset. 
I2C SDA write (0 = SDA pin is driven low, 1= SDA pin is tri-stated) 
When this pin is tri-stated, other devices can drive this line, and the final state of the 
pin is reflected in bit 22. Default to 1 upon reset. 
VMI/ I2C SCK state (read only, 0 = low, 1 = tri-stated which means no device is driving this 
encoder 
27 VMI/ I2C SDA state (read only, 0 = low, 1 = tri-stated which means no device is driving this 
encoder i 


4 


><i< 


Nn 


13:6 
17:14 


w) 


2 


[3 [host 
[S| host 
[13:6 | host/VMI_| 
[17:14 | host 
a 
[18 | host 
Ye 
fe 
a 
[23 | host 
a 
[28 | host 


26 


N 


8 host VMI reset_n (1 = normal. 0 = reset VMI device.) Default to 0 upon reset. 


output only gpio GPIO[1] output 


30 VMI/ input only gpio GPIO[2] input 
encoder 


TV out reset_n (1 = normal, 0 = reset TV out device). Default to 0 upon reset. 


VMI and ROM Pin Sharing (see Notes below) 
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Name 
Pixdata/a0 
Pixdata/al 
Pixdata/a2 
Pixdata/a3 
Pixdata/a4 
Pixdata/a5 
Pixdata/a6 
Pixdata/a7 
vmi_adr/a8 
vmi_adr/a9 
vmi_adr/al0 
vmi_adr/al 1 
vmi_cs n 
vmi_rw 


vmi_ds_n/al5 
vmi_rdy/busy 


vmi_hdata 
vmi_hdata 
vmi_hdata 
vmi_hdata 
vmi_hdata 
vmi_hdata 
vmi_hdata 
vmi_hdata 
Hsync 
Vsync 
Blank _n 
pix_clk_in 
vmi_intreq_n 
vmi_reset_n 
rom_oe n 
rom_we n 
i2c_clk 
i2c_data 
gp_io[0] 
gp_io[1] 
gp 10 [2] 


Number 


Rom Access 


Function 


CANNOT USE! 
Al4 


NOT USED 
NOT USED 
NOT USED 
NOT USED 
Al3 

NOT USED 
rom_oe n 
rom_we n 
NOT USED 
NOT USED 
vmi_oe n 
vmi_sync oe n 


Type 
out 
out 
out 
out 
out 
out 
out 
out 
out 
out 
out 
out 


out 
out 
out 


in/out 
in/out 
in/out 
in/out 
in/out 
in/out 
in/out 
in/out 


out 
out 
in 


Napalm Graphics Engine 


VMI Access 
Type 


Function 
Y0/Cr0/Cb0* 
Y1/Cr1/Cb1* 
Y2/Cr2/Cb2* 
Y3/Cr3/Cb3* 
Y4/Cr4/Cb4* 
Y5/Cr5/Cb5* 
Y6/Cr6/Cb6* 
Y7/Cr7/Cb7* 
vaddr0 
vaddr1 
vaddr2 
vaddr3 
vmi_cs n 
vmi_rw_n/ 
vmi_wr n 
vmi_ds_n/ 
vmi_rd n 
vmi_dtack_n/ 
vmi_rdy_n* 
vmi_hd_0 
vmi_hd_ 1 
vmi_hd 2 
vmi_hd_ 3 
vmi_hd_ 4 
vmi_hd_5 
vmi_hd_ 6 
vmi_hd_7 
hsynce 
vsync_ n 
blank _n 
vid_clk in 
vmi_int_n* 
reset_n 
CANNOT USE! 
CANNOT USE! 
i2c_clk 
i2c_data 
vmi_oe n 
vmi_syne_ oe n 


* means the signal may be buffered from the VMI data bus to ensure that it is not driven during ROM 


accesses. 


TV Encoder Pins (see Notes below) 


Pin TV Encoder/Flat Panel 

Name Number Function Type 
(Scrambled) (Unscrambled) 
rising/falling rising/falling 
edge edge 

tv_data[0] BO/GI G3/R7 out 
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tv_data[2] B1/RO 
tv_data[2] B2/R1 
tv_data[3] G0/R2 
tv_data[4] B3/G5 
tv_data[5] B4/G6 
tv_data[6] B5/G7 
tv_data[7] B6/R3 
tv_data[8] B7/R4 
tv_data[9] G2/R5 
tv_data[10] G3/R6 
tv_data[11] G4/R7 
tv_clk out Clock_out 
tv_hsync Hsync 
tv_vsynce Vsyne 
tv_blank Blank_n 
tv_inclk Clock_in 
tv_reset Reset_n 


Notes: 


1. 


2. 
3. 
4 


Rom access and VMI video data/host port access can only be performed separately. 

The Type field in the tables above are referenced to Napalm. 

Programming of the VMI or TV Encoder device can be done via I2C, e.g. setting PAL mode. 

The TV encoder must be able to operate in Master mode where it supplies the clock, vsync, hsync, 
blank and Napalm outputs tv_clk_out (delayed version of tv_inclk) and synchronous data. 

We must route a reference board to make sure the pin functions have been shared to provide a decent 
route. 

The ROM cs_n is tied to GND, the oe_n and we_n are used to control read/write respectively. 


11.1.10 3.1.10 vidTvOutBlankHCount 


If TV Out Genlock is enabled (VidInFormat[16] == 1’b1, VidInFormat[18]== 1’b1), and 

Not_use_vga_ timing signal is asserted, vidTvOutBlankHCount bits[10:0] contains the number of clock 
cycles after the leading edge of tv_hsync before the horizontal active region starts (i.e., horizontal blank 
becomes deasserted). 


vidTvOutBlankHCount bits[26:16] contains the number of clock cycles after leading edge of tv_hsync 
before the horizontal active region ends (i.e., horizontal blank is re-asserted). 


Output blank_n == horizontal blank_n AND vertical blank_n. 


Note that the value in bits[26:16] needs to be greater than bits[ 10:0]. The clock cycles are based on the 
clock coming in through the tv_inclk pin. 
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11.1.11 3.1.11 vidInXDecimDeltas (for VMI downscaling Brensenham Engine)/ 
vidTvOutBlankHCount (for TV out master mode) 

If VideoIn Interface is configured to VMI mode (i.e., VidInFormat[ 15:14] == 2’b01), vidInXDecimDeltas 
bits [11:0] contain the width of the destination video-in surface (width of the video overlay stored in the 
frame buffer) in number of pixels. VidInXDecimDeltas bits[27:16] contain the width difference between 
the source video-in surface (from VMI port) and destination video-in surface in number of pixels (Source - 
Destination) 


Bit Description 

11:0 The positive (unsigned) value added to the error term when the horizontal Bresenham error 
term is <0. It is programmed to be the width of the destination video-in surface in number 
of pixels. 


15:12 reserved 


27:16 The positive (unsigned) value added to the error term when the horizontal Bresenham error 
term is >0. It is programmed to be the difference between the width of the source and 
destination video-in surfaces. (Source - Destination) in number of pixels. 


31:28 reserved 


11.1.12 3.1.12 vidInDecimInitErrs 


[Bit | Description 


12:0 The signed (2’s complement) initial value of the error term in the horizontal Bresenham 
accumulator 


15:13 


28:16 The signed (2’s complement) initial value of the error term in the vertical Bresenham 
accumulator 


31:29 


11.1.13 3.1.13 vidInYDecimDeltas 

If VideoIn Interface is configured to VMI mode (1.e., VidInFormat[ 15:14] == 2’b01), vidInYDecimDelta 
bits[11:0] contain the height of the destination video input window (height of the video overlay stored in 
the frame buffer) in number of lines. vidInYDecimDeltas contains the height difference between the source 
video surface (from VMI port) and destination video input window in number of lines (Source - 
Destination). 


The positive value added to the error term when the vertical Bresenham error term is <0 


15:12 
27:16 The positive value added to the error term when the vertical Bresenham error term is >0 
31:28 


Bresenham scaler for scaling down a video window in the horizontal direction: 
error = vidInXDecimInitErr; 
repeat until the source pixels of a video window scanline are exhausted 

if (error < 0) 


move to next source pixel 
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error = error + vidInXDecimDeltal 
else 
select the current source pixel as the destination pixel 
move to next source pixel 


error = error - vidInXDecimDelta2 


Bresenham scaler for scaling down a video window in the vertical direction: 
error = vidInY DecimInitErr 
at each VideoIn Hsync 
if (error < 0) 
skip the whole line of video in data 
error = error + vidInYDecimDeltal 
else 
select the current line of video in data 


error = error - vidInYDecimDelta2 


11.1.14 3.1.14 vidPixelBufThold 


The vidPixelBufThold determines how many empty slots in each of the three pixel buffers will trigger 
refilling of the buffers. 


Primary pixel buffer low watermark (0x0 — 1 empty slot; 0x3f— 64 empty slots) 


Secondary pixel buffer 0 low watermark (0x0 — 1 empty slot; Ox3f— 64 empty slots) 
17:12 Secondary pixel buffer 1 low watermark (0x0 — 1 empty slot; 0x3f— 64 empty slots) 


11.1.15 3.1.15 vidChromaKeyMin Register 
The vidChromaKeyMin register contains the lower bound of the chroma key color. 


| Bit desktop colorformat 
| | Git desktop colorformat 
Blue value of the chroma -key 
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24-bit desktop color format 


Blue value of the chroma -key 
Green value of the chroma -key 
Red value of the chroma -ke 
Reserved 

ee) 


32-bit desktop color format 


Blue value of the chroma -key 
Green value of the chroma -ke 
Red value of the chroma -ke 
Reserved 


11.1.16 3.1.16 vidChromaKeyMax Register 


The vidChromaKeyMax register contains the upper bound of the chroma key color. It is the same as 
vidChromaKeyMin if the chroma-key is a single color instead of a range. 


Bit 
Format same as vidChromaKeyMin Register 


11.1.17 3.1.17 vidInStatusCurrentLine Register 


The vidInStatusCurrentLine register contains the current scan out line. As the vertical beam scans down 
the display this register is incremented. 


Bit 
Current Video scan line. 


The vidInStatusCurrentLine register also allows the host to read the status of the video-in port, and 
implement manual buffer flipping for the video-in data. 


Description 

Even/odd field of the frame VMI just finishes drawing. 1=even; 0=odd. 
Video-in buffer VMI just finishes writing to. 

00=buffer 0 (as specified by vidInAddr0); 

01=buffer | (as specified by vidInAddr1); 

10=buffer 2 (as specified by vidInAddr2); 

11=No buffer is ready yet, video processor is still working on the first frame 


11.1.18 3.1.18 vidScreenSize 


NOTE: Whenever the screen resolution is changed, video processor needs to be re-enabled by 
clearing vidProcCfg bit 0 and setting it to 1. This will reset the video processor. 


[Bit | Description 


11:0 Width of the screen in number of pixels. If vidScreenX is specified to be bigger than 
1280, 2x mode needs to be enabled. 


22:12 Height of the screen in number of lines. 


11.1.19 3.1.19 vidOverlayStartCoords 


Bit 
The x-coordinate on the screen where the upper left corner of the overlay locates. 
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The y-coordinate on the screen where the upper left corner of the overlay locates. 

The lower two bits of the x-coordinate for the first pixel (at the upper left corner) of the 
overlay window with respect to the beginning of the source surface. Since the overlay 
window may be partially occluded by the dimension of the screen, the first pixel of the 
window may not necessarily be the first pixel of the source surface. The lower two bits of 
the x-coordinate are used for undithering, and are the same for both linear and tiled 
address space. 


The lower two bits of the y-coordinate for the first pixel (at the upper left corner) of the 
overlay window with respect to the beginning of the source surface. Since the overlay 
window may be partially occluded by the dimension of the screen, the first pixel of the 
window may not necessarily be the first pixel of the source surface. The lower two bits of 
the y-coordinate are used for undithering, and are the same for both linear and tiled 
address space. 


31:28 reserved 


11.1.20 3.1.20 vidOverlayEndScreenCoord 


Description 
The x-coordinate on the screen where the lower right corner of the overlay locates. 
The y-coordinate on the screen where the lower right corner of the overlay locates. 


per horizontal step in screen space for magnification. Format is 0.20. 


Description 

Initial offset of Dudx. Format is 0.19. 

Number of bytes needed to be fetched from the source surface in order to cover a whole 
un-occluded scanline for the overlay (14 bits allows a max of bytes for an overlay 
scanline). 

i.e., (Overlay width in number of screen pixels * vidOverlayDudx) + 
vidOverlayDudxOffset)) * overlay pixel depth in bytes. 

For non-scaled overlay with no offset, vidOverlayDudx becomes 1, and 
vidOverlayDudxOffset becomes 0 in the above equation. 


11.1.23 3.1.23 vidOverlayDvdy 


size in source per vertical step in screen space for magnification. Format is 0.20. 


11.1.24 3.1.24 vidOverlayDvdyOffset 


Bit 
Initial offset of Dvdy. Format is 0.19. 
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Example: 


Given source size of 640 x 240 and have it magnified to 1024 x 768 on the screen. 


Source width: 
Dudx[31:19] = 640 X 2 bytes = 1280 bytes (here 16 bpp assumed) 
= 500h 


Dudx[19:0] = 640/1024 = 0.625 = a0000h 
(Note format is 0.20 means 
XXXXXXXXXXXXXXXXXXXX 


Dvdy[19:0] = 240/768 = 0.3125 = 0.25 + 0.0625 = 50000h 
(Format same as dudx above) 


Dudx Offset[18:0] and Dvdy Offset [18:0] = 00000h if no initial offset is needed. 
If upper leftmost overlay pixel needs to be the center of 

the first pixel of the overlay surface, both offsets needs to be set to 0.5 

which is 40000h. 


11.1.25 3.1.25 vidDesktopStartAddr 


| 25:0 —_—+|- Physical starting address of the desktop surface. This is a byte-aligned address. 


11.1.26 3.1.26 vidDesktopOverlayStride 


If the desktop surface resides in linear space, bit[14:0] contains the linear stride of the 
surface in bytes. If interlaced video output mode is enabled, the linear stride is still 
programmed to 1x the regular stride of the surface, and will be multiplied by 2 when 
used. 

If the desktop resides in tile space, bit[14:0] contains the tile stride of the region. This is 
specified in number of tiles across the width of the tile address region, NOT the width of 
the desktop surface. 


For video overlay, the stride needs to be a multiple of 4-bytes for YUV 422 pixel format and a multiple 
of 8-bytes for YUV 411 pixel format. This ensures that the right edge of the video source surface to fall on 
a boundary of 2 pixels for YUV 422 and 4 pixels for YUV 411. The start address for the overlay is sampled 
from the FIFO’ed leftOverlayBuf and rightOverlayBuf registers. The start address needs to be aligned on 
a 32-bit boundary for YUV 422 pixel format and a 64-bit boundary for YUV 411 pixel format. 


30:16 If the overlay surface resides in linear space, bit[30:16] contains the linear stride of the 
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overlay surface in bytes. If interlaced video output mode is enabled, the linear stride is 
still programmed to 1x the regular stride of the surface, and will be multiplied by 2 when 
used. 

If the overlay resides in tile space, bit[30:16] contains the tile stride of the region. This is 


specified in number of tiles across the width of the tile address region, NOT the width of 
the overlay surface. 


11.1.27 3.1.27 vidInAddr0 


Bit Description 


| po Starting address of video-in buffer 0 


11.1.28 3.1.28 vidInAddrl 


Bit 
1250  ~— | Starting address of video-in buffer 1 


11.1.29 3.1.29 vidInAddr2 


Bit 
1250 sd Starting address of video-in buffer 2 


11.1.30 3.1.30 vidInStride 


If the video-in buffers reside in linear space, this register contains the linear stride of the 
buffer in bytes. If interlaced video input mode is enabled, the linear stride is still 


programmed to |x the regular stride, and will be multiplied by 2 before used. 

If the video-in buffers reside in tile space, this register contains the tile stride of the 
region. This is specified in number of tiles across the width of the tile address region, 
NOT the width of the video-in buffers. 


11.1.31 3.1.31 vidCurrOverlayStartAddr 


The vidCurrOverlayStartAddr register allows the host to read the start address which the video processor is 
using to refresh the overlay window for the current frame. 


Bit Description 


| 25:0 | Start physical address the video processor is using to refresh the overlay window. Read only. 
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112 3.2 Video-In Interface 


11.2.1 3.2.1 Function 


Video In Processor supports several connector interfaces for video data input. The following table shows 
the signals needed for each interface. 


11.2.2 3.2.2 Signals 


| VMI Video Port [| | Hsyncin | Hsyncin | Hsyncin 
| VMI Video Port | | Vsyncin | Vsyncin | Vsyncin 
| VMI Video Port [| | Vactive | Vactive | Vactive 
|VMI Video Port [| PTO] TO] TO] 
| VMI Video Port [| | Pixclk (in) | Pixclk (in) | Pixclkin) 
|VMII2C Port | | SDA in/out) | 
PVMII2C Port | | SCK invout) | 
|VMIHostPort [| | 7-0] (invout) | D7:0] 
|VMIHostPort [| | ALO] (out) | AY:0] 
|VMIHost Port | | es Gout) cs 
|VMIHostPort [| | ds nut) td 
|VMIHostPort [| | fw out) wren 
|VMIHostPort [| | tack (im) ready 
[DDC Port | SDA(in/out) | 
|DDC Port | SCK(infout) | 
[System signals | vi reset_n (out) | vmi reset_n (out) | vmi reset_n (out) _| 
[System signals | | vi int nin) | vmilintn | vmiintn 
[System signals | | viii present_n (in) | vmi_presentn | vmi_present_n__| 


A. Video-In Interface: 


<1<1<!1<|<!1<!1<|<1<!1</<l< 


< 


General Description 


When video data arrives through the Video-In interface, they undergo the optional decimation and filtering, 
packed into words of 128 bits in a FIFO before written into the memory. As writes to the memory is always 
aligned on a 128-bit boundary, the appropriate byte enables also need to be set with the writes. Supported 
pixel formats for the video-in data are YUV422 and YUV411. Both pixel formats are stored in a form of 16 
bit per pixel, which means that 4 bit are unused per pixel in the case of YUV411. 


Video data are stored in the Video-In frame buffers whose starting addresses are specified by the registers 
VidInAddr0, VidInAddr1, and VidInAddr2. VidInAddr1 and VidInAddr2 are used for double and triple 
buffering to avoid video tearing. However, since video is coming in at a different rate from the video 
refresh, switching of the video-in drawing buffers is not synchronous to the Vsync of the video refresh. At 
the end of each VMI frame, the vmi_int input signal will be asserted. The video processor will then switch 
to the next video-in frame buffer for the next VMI frame if multiple buffering is enabled. If disabled, the 
same video-in frame buffer will be overwritten. At the same time, the video processor also updates the 
VidInStatus register which indicates the VMI buffer VMI just finishes drawing (0, 1, 2), and whether the 
buffer contains even or odd field. An interrupt signal will signal the host for display buffer flipping for the 
video-in data. On the other hand, if the “Video_in data displayed as overlay enable” bit in VidProcCfg is 
set, the video porcessor will do the display buffer flipping automatically for the overlay provided that all 
the corresponding configuration registers for the overlay is set up correctly (e.g., overlay surface enable, 
overlay pixel format, overlay_dudx, ...... etc). 
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If Weave video-in deinterlaced mode is enabled, the video processor detects even/odd field from 
VREF(Vsync) and HREF(Hsync). If odd, the specified VidInAddr register will be used as the starting 
address of the video-in frame buffer. If even, VidInStride will be used as the starting address offset, and 
added to the specified VidInAddr. Video-in buffer will be switched at every other Vsync. VidInStride 
should be programmed to contain the a value which equals to 1X the regular line stride regardless of 
whether the video-in data is interlaced or not. 


1. VMI 
-data: 


8-bit YCbCr interface is used. The data format is CCIR-656 YCbCr 422, and pixels arrive in the style of 
(Cb0[7:0] or UO[7:0]) -> YO[7:0] -> (CrO[7:0] or VO[7:0]) -> Y1[7:0]. 


Video data may be interlaced. 
-timing: 
Timing signals include VREF, HREF, VACTIVE, and PIXCLOCK. 


VREF and HREF are active high VSYNC and HSYNC. If HREF is high during the falling edge of VREF, 
the field is even. If HREF is low at that time, the field is odd. 


VACTIVE is a blanking signal which indicates pixel data is valid across the YCbCr bus. 


11.3 3.3 Video Limitation 


1. In 1x mode, 3 streams of pixel fetching will consume more memory bandwidth than available for 32- 
bit desktop. This means chroma-keying and bilinear filtering cannot be turned on simultaneously for 
32-bit desktop. 


2. In 2x mode (for any display larger than 1280 X 1024) where we refresh 2 screen pixels per 
cycle at 110MHz, bilinear filtering is not supported. All backend zoom (magnification) is done 
by point sampling (replication). 


3. 1-10X backend zoom (magnification) with increments of 0.1X. Larger magnification is 
supported, but with bigger increments. 


1 to 1/16X video-in decimation (minimization) with increments of 0.015X. 


4. Retain the 3-bit tap filter for RGB565 dithered as an alternative 
to the 2x2 box filter. 


5. Interlaced video output mode is not implemented. 


6. Hwcursor is 2 color only. 

7. YUV 411 pixel format will be stored as unpacked in the frame buffer. This means each pixel 
will occupy 16 bits instead of 12 bits. This makes pixel extraction easier, but consumes more 
memory. 


8. Video with YUV 422 format needs to be stored on a 4-byte memory boundary while YUV 411 
on a 8-byte boundary. This is necessary because UV are shared between 2 pixels in 422 
while UV are shared between 4 pixels in 411. 
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12. 


13. Command Transport Protocol 


13.1 Command Transport 


A command FIFO (CMDFIFO) may be established by software within frame buffer memory or AGP 
memory. Writes to the linear frame buffer address space are performed to build a command buffer, which 
is then parsed and executed by the accelerator. To accommodate host CPUs which may issue writes out-of- 
order (eg. Intel’s Pentium Pro), one of two scenarios will occur, the CMDFIFO resides in AGP (non-local 
video memory) and software manages the accelerator’s internal CMDFIFO depth register, or the 
CMDFIFO resides in frame buffer memory and the accelerator manages the internal CMDFIFO depth 
register. 


If the CMDFIFO resides in AGP space (non-local video memory), software “BUMPS” the internal 
CMDFIFO depth register after N words into the AGP buffer. This allows the CPU to write to the 

CMDFIFO in any order, flush any pending writes in the CPU’s internal write buffers and core logic 
chipset’s internal write buffers, then update the accelerator’s depth register. Since writes to the CMDFIFO 
will be in consecutive order, the CPU’s write buffers will fill and burst into memory more efficiently, than 
random PCI writes. 


If the CMDFIFO resides in frame buffer memory, software writes to the frame buffer in consecutive order, 
the CPU flushes its write buffer in any order to the accelerator. The accelerator counts the number of non 
written addresses, once consecutive addresses are written, the internal CMDFIFO depth register is updated 
to the last consecutive written address. Counting unwritten addresses allows the CPU to flush its internal 
write buffers in any order, but maintains the correct order in the frame buffer memory. Software must 
manage the circular buffer at the point where the buffer recycles to the beginning. This is done by placing 
a JMP instruction (CMDFIFO Packet Type 0, Func 100) at the bottom of the fifo to restart at the beginning 
of the CMDFIFO space. 


13.1.1 CMDFIFO Management 


The CMDFIFO mechanism supports 2 types of fifo management, software and hardware. When the 
CMDFIFO is located in frame buffer memory either software management or hardware management can 
be used on the CMDFIFO, unlike AGP which only supports software management of the CMDFIFO. 


13.1.1.1 Software Management of CMDFIFO 


Software manages the CMDFIFO “emptiness.” The accelerator maintains a read pointer and a depth for 
the CMDFIFO. Accelerator reads from the CMDFIFO decrement the depth register and increment the read 
pointer. The accelerator will automatically execute data from the CMDFIFO as long as the internal 
CMDFIFO depth register is greater than zero. When the CPU is ready to inform the accelerator that more 
data is available in the CMDFIFO, the CPU writes the number of 32-bit words that have been added to the 
end of the CMDFIFO. The accelerator then adds the value written by the CPU to the internal depth 
register. 

The accelerator’s internal registers define where the circular CMDFIFO exists in frame buffer memory by 
defining a beginning address for the CMDFIFO and a rollover address. By default, the CMDFIFO internal 
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read pointer is set to the beginning address for the CMDFIFO. Once data is stored in the CMDFIFO (and 
the internal depth register is incremented by the CPU), the CMDFIFO read pointer will increment as the 
accelerator parses and executes the CMDFIFO. When the read pointer equals the rollover address defined 
by initialization registers, the read pointer will jump back to the beginning CMDFIFO address. The 
CMDFIFO is thus programmable in size as a circular space from | to N 4k byte pages. Software must 
manage CMDFIFO “fullness” and guarantee that the CMDFIFO does not overflow. On systems like the 
Intel Pentium Pro, software must place a fence after the last memory write, but before the write to increase 
the number of new entries in the CMDFIFO. 


13.1.1.2 Hardware Management of CMDFIFO 


Hardware manages the CMDFIFO “emptiness.” The accelerator maintains a read pointer, write pointer, 
and depth for the CMDFIFO. Accelerator reads from the CMDFIFO decrement the depth register and 
increment the read pointer. The accelerator will automatically execute data from the CMDFIFO as long as 
the internal CMDFIFO depth register is greater than zero. The CPU writes data into the CMDFIFO area in 
sequential addresses. The accelerator snoops the writes into the CMDFIFO area and examines the 
addresses, looking for non sequential addresses or “holes.” When the accelerator gathers sequential 
addresses present in the CMDFIFO, the depth and write pointers are incremented. The accelerator’s 
internal registers define where the circular CMDFIFO exists in frame buffer memory by defining a 
beginning address for the CMDFIFO and a rollover address. By default, the CMDFIFO internal read 
pointer is set to the beginning address for the CMDFIFO. Once data is stored in the CMDFIFO (and the 
internal depth register is incremented by the CPU), the CMDFIFO read pointer will increment as the 
accelerator parses and executes the CMDFIFO. When the read pointer equals the rollover address defined 
by initialization registers, the read pointer will jump back to the beginning CMDFIFO address. The 
CMDFIFO is thus programmable in size as a circular space from | to N 4k byte pages. Software must 
manage CMDFIFO “fullness” and guarantee that the CMDFIFO does not overflow. On systems like the 
Intel Pentium Pro, software must place a fence after the last memory write, but before the first write to the 
top of the CMDFIFO. Software may not write less than four 32 bit entries before performing a jump to the 
begining of the buffer. 


13.1.2 CMDFIFO Data 


All CMDFIFO data packets begin with a 32-bit packet header which defines the data which follows. There 
are 6 different types of CMDFIFO packet headers. Bits (2:0) of a CMDFIFO packet header define the 
packet header type. All CMDFIFO packet headers and data must be 32-bit words - byte and 16-bit short 
writes are not allowed in the CMDFIFO. 


13.1.3 CMDFIFO Packet Type 0 


CMDFIFO Packet Type 0 is a variable length packet, requiring a minimum single 32-bit word, to a 
maximum of 2 32-bit words. CMDFIFO Packet Type 0 is used to jump to the beginning of the fifo when 
the end of the fifo is reached. CMDFIFO Packet Type 0 also supports jumping to a secondary command 
stream just like a jump subroutine call (jsr instruction), with a CMDFIFO Packet that instructs a return as 
well. NOP, JSR, RET, and JMP LOCAL FRAME BUFFER functions only require a single 32-bit word 
CMDFIFO packet, while the JMP AGP function requires a two 32-bit word CMDFIFO packet. Bits 31:29 
are reserved and must be written with 0. 


CMDFIFO Packet Type 0 


word 0 Address [24:2] | Func | 000 | 
word ares 3525 
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Code Function 


[ooo [NOP SS—~S 


JMP LOCAL FRAME BUFFER 
IMP AGP 


13.1.4 CMDFIFO Packet Type 1 


CMDFIFO Packet Type | is a variable length packet that allows writes to either a common address, or to 
consecutive addresses, minimum number of words is 2 32-bit words, and maximum number of words is 
65536 words. Bits 31:16 define the number of words that follow word 0 of packet type 1, and must be 
greater than 0. When bit 15 is a 1, data following word 0 in the packet is written in consecutive addresses 
starting from the register base address defined in bits 14:3. When bit 15 is a 0, data following word 0 is 
written to the base address. Packet header bits 14:3 define the base address of the packet, see section 
below. The common use of packet type | is host blits. 


CMDFIFO Packet Type 1 


31 16 
word 0 Number of words Register Base (See below) 


word | Data 


word N Optional Data N 


Register base: 


13.1.5 CMDFIFO Packet Type 2 


CMDFIFO Packet Type 2 is a variable length packet, requiring a minimum of 2 32-bit words, and a 
maximum of 30 32-bit words for the complete packet. The base address for CMDFIFO Packet Type 2 is 
defined to be offset 8 of the hardware 2D registers(clipOmin). The first 32-bit word of the packet defines 
individual write enables for up to 29 data words to follow. From LSB o MSB of the mask, a “1” enables 
the write and a “0” disables the write. The sequence of up to 29 32-bit data words following the mask 
modify addresses equal to the implied base address plus N where mask[N] equals “1” as N ranges from 0 
to 28. The total number of 32-bit data words following the mask is equal to the number of “1”’s in the 
mask. The register mask must not be 0. 


CMDFIFO Packet Type 2 


31 3 
word 0 2D Register mask 
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word | Data 
word N Optional Data N 


13.1.6 CMDFIFO Packet Type 3 


CMDFIFO Packet Type 3 is a variable length packet, requiring a minimum of 3 32-bit words, and a 
maximum of 16 vertex data groups, where a data group is all the register writes specified in the parameter 
mask, for the complete packet. It is a requirement that bits 9:6 must be greater than 0. The base address 
for CMDFIFO Packet Type 3 is defined to be the starting address of the hardware triangle setup registers. 
The first 32-bit word of the packet defines 16 individual vertex data. Bits 31:29 of word 0 define 0 to 7 
dummy fifo entries following the packet type 3 data. The sSetupMode register is written with the data in 
bits 27:10 of word 0. Bits 9:6 define the number of vertex writes contained in the packet, where the total 
packet size becomes what is defined in the parameter mask multiplied by the number of vertices. During 
parsing and execution of a CMDFIFO Packet Type 3, a specific action takes place based on bits 5:3. The 
sSetupMode register implies that X and Y are present in words | and 2. When Bit 28 when set, packed 
color data follows the X and Y values, otherwise independent red, green, blue, and alpha follow X and Y 
data. When Smode field is 0, then word 0 defines X, and word | defines Y. 


Code 000 specifies an independent triangle packet, where an implied sBeginTriCMD is written after 2 
sDrawTriCMD’s. The sequence would follow, sBeginTriCMD, sDrawTriCMD, sDrawTriCMD, 
sBeginTriCMD, until “NumVertex” vertices has been executed. 


Code 001 specifies the beginning of a triangle strip, an implicit write to sBeginTriCMD is issued, followed 
by Num Vertex sDrawTriCMD writes. The sequence would follow, sBeginTriCMD, sDrawTriCMD, 
sDrawTriCMD, sDrawTriCMD, until “num Vertex” vertices has been executed 


Code 010 specifies the a continuance of an existing triangle strip, an implicit write to sDrawTriCMD is 
performed after one complete vertex has been parsed. 


CMDFIFO Packet Type 3 


ar29 [2s [272] sd) #]s_3]20] 
word 0 


word | Data 


word N Optional Data N 


Code Command 


[00 [independent Tangle 


Bit sParamMask field description 
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|_| sSetupMode field 


Culling Sign (0O=positive sign 
l=negative sign) 


25 Disable ping pong sign correction 
during triangle strips (O=normal, 
1=disable) 


Parameter 


Z (optional) 
Wbroadcast (optional) 


Sequence of implied commands for Each code follows: 
M = Mode register write 

B = sBeginTriCMD 

D =sDrawTriCMD 

Code 000: MBDDBDDBDDBDD ... 

Code 001: MBDDDDDDDDDDD ... 

Code 010: MDDDDDDDDDDDD ... 


13.1.7 CMDFIFO Packet Type 4 


CMDFIFO Packet Type 4 is a variable length packet, requiring a minimum of 2 32-bit words, and a 
maximum of 22 32-bit words for the complete packet. The first 3 bits 31:29 of word 0 define the number 
of pad words that follow the packet type 4 data. The next 14 bits of the header 28:15 define the register 
write mask, followed by the register base field, described later in this section. From LSB to MSB of the 
mask, a “1” enables the write and a “0” disables the write. The sequence of up to 22 32-bit data words 
following the mask modify addresses equal to the implied base address plus N where mask[N] equals “1” 
as N ranges from 0 to 16. The total number of 32-bit data words following the mask is equal to the number 
of “1”’s in the mask. As a requirement, the general register mask must have a non zero value 
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CMDFIFO Packet Type 4 


3129 [28 E 
word 0 General Register mask Register Base (See below) 


word | Data 
word N Optional Data N 


Register base: 


13.1.8 CMDFIFO Packet Type 5 


CMDFIFO Packet Type 5 is a variable length packet, requiring a minimum of 3 32-bit words, and a 
maximum of 2%19 32-bit words for the complete packet Bits 31:30 define LFB type, one of linear frame 
buffer, planar YUV space, 3D LFB, or texture download space. Bits 29:26 in word 0 define the byte 
enables for word 2, and are active low true. Bits 25:22 in word 0 define the byte enables for word N. Data 
must be in the correct data lane, and the base address must be 32-bit aligned. CMDFIFO Packet Type 5 is 
used to transfer large consecutive quantities of data from the CPU to the accelerator’s frame buffer with 
proper order with the command stream. One note, transfer to tile space is limited if tile-stride does not 
match PCI stride. Tile space rows are not continuous, thus each tile row must be separated into separate 
packets. 


NOTE: when downloading into the texture download space, the aperture is 4MB wide. The lower 2M of 
this space is sent into the TMU0 download port, while the upper 2M is sent to the TMU1 download port. 
Downloads to either TMU are not guaranteed to be synchronous with each other. Please refer to the 
“maintaining texture cache coherency” section of this spec. 


Downloads to the TMUO alias should have a BaseAddress[26:0] starting at 0x000000, while downloads to 
TMUI should have a BaseAddress{26:0] of 0x200000. When the hardware receives a CMDFIFO Packet 5 


command to download into texture space, the value 0x600000 is added to the downloads generated in order 
to normalize the downloads to Napalm Address Space. 


CMDFIFO Packet Type 5 


3130 [29 26 [35 2 
word 0 Byte Enable W2 | Byte Enable WN 
word I BaseA ddress [260] 


word 2 Data 


word N Optional Data N 


Planar YUV 
3D LFB 
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13.1.9 CMDFIFO Packet Type 6 


CMDFIFO Packet Type 6 is a fixed length packet requiring 5 32-bit words for the complete packet. 
CMDFIFO Packet Type 6 is primarily used to transfer data from system AGP memory into frame buffer 
local memory. Bits 20:5 of word 0 define the transfer size in bytes of an AGP transfer. Bits 4:3 define the 
destination memory space LFB, Planar YUV, 3D LFB, and texture port. Word | bits 31:0, and word 2 bits 
27:24 define the source system AGP address of the data move. Bits 23:12 define the stride, and bits 11:0 
define the width of the source surface in AGP memory. Word 3 defines the destination frame buffer 
address, while word 4 bits define the stride of the destination surface. 


CMDFIFO Packet Type 6 


A 
word 
word 1 
word 2 
word 3 
word 


type Space 


[00 [Tinear BJ 
Planar YUV 


SD LFB 


13.1.10 Miscellaneous 


Napalm supports two full CMDFIFO streams and each individually can be located in frame buffer memory 
or AGP space. Each CMDFIFO has it’s own base address register set, that define the starting address, 
memory space, and size of the CMDFIFO. The CMDFIFO registers contain a write only bump register 
that increments the write pointer by the amount written to the cmdBump register. Each CMDFIFO 
contains a read pointer, write pointer, and freespace count of the fifo itself, so the CPU can monitor the 
progress and fullness of the CMDFIFO. Ordering between the two CMDFIFO’s is first come, first served. 


14. AGP/CMD Transfer/Misc Registers 
Memory Base 0: Offset 0x0080000 


Register Name Address | Bits | R/W Description 
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Po AGP 
| agpGraphicsAddress | 0x00C(12)_| 26:0 | R/W__| Graphics address bits 26:0 
| agpGraphicsStride | Ox010(16) | 14:0 | RW | Graphicsstride 
agpMoveCMD n/a 

po 


[nla 
| 
cupriFoo |_| | 
cmdBaseAddr0 


cmdBaseSize0 
cmdBump0 
cmdRdPtrL0O 
cmdRdPtrH0 


Uo soe 
Base Address of CMDFIFO 1 
CMDFIFO1 size 


cmdBump1 0x058(88) 15: Bump CMDFIFO 1 N words 
cmdRdPtrL1 0x05c(92) : CMDFIFO 1 read pointer lower 32 bits 


ay 


0x080(96) | CMDFIFO fetch threshold 
0x084(100) md hole timeout value 


C - 
YUV planar base address 


vStride Y, U and V planes stride value 


14.1 agpReqSize 


agpRegqSize defines the AGP packet transfer size. The maximum transfer size is 1-Mbyte block of data. 
This register is read write and defaults to 0x0. 


[19:0 re 
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14.2. agpHostAddressLow 

During AGP transfers this address defines the source address bits 31:0 of AGP memory to fetch data from. 
AGP addresses are 36-bits in length and are byte aligned. The upper 4 bits reside in the 
agpHostAddressHigh register. This register is read write, and defaults to 0. 


Bit Description 


Lower 32 bits of AGP memory. Default is 0x0. 


Revision 1.13 
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14.3. agpHostAddressHigh 


The agpHostAddressHigh defines the stride, width, and upper 4-bits of source AGP address, during AGP 
transfers. Stride and width are defined in quadwords. This register is read write, and defaults to 0. 


Bit Description 


AGP Width. Default is 0x0. 


27:14 AGP Stride. Default is 0x0. 


31:28 Upper 4 bits of AGP memory. Default is 0x0. 


14.4 agpGraphicsAddress 

agpgraphicsAddress defines the destination frame buffer address and type of the AGP transfer. At the 
beginning of an AGP transfer this address is loaded into an internal address pointer that increments for each 
data received over AGP. This register is read write, and defaults to 0. 


Bit 
26:0 ——~—_—i{- Frame buffer offset. Default is 0x0. 


14.5 agpGraphicsStride 


agpGraphicsStride defines the destination stride in bytes of the AGP transfer. Stride is in multiples of 
bytes. This register is read write, and defaults to 0. 


Bit 
Frame buffer Stride. Default is 0x0. 


14.6 agpMoveCMD 


agpMoveCMD starts an AGP transfer. When started agpHostAddress is loaded into the source pointer 
and agpGraphicsAddress is loaded into the destination pointer. The source pointer is incremented after 
data is fetched from AGP memory and written into frame buffer memory addresses by the destination 
pointer. The destination pointer is then incremented after the data has been written. This register is write 
only and has no default. 
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Description 


Reserved 
Dest memory type (0=Linear FB, 1=planar YUV, 2=3D LFB, 3 = texture port). 


Default is 0x0. 
Command stream ID. This bit defines which command fifo when using a host initiated 


AGP data move. Default is 0x0. 


15. Command Fifo Registers 


Linear Memory Contents 


= 
5 - 
= Cc 
(40) oO. = (4) 
® ao) = 
7) - 1°) O 
(4) 
ce} 
written location unwritten location 


The command registers define the location, size, and fifo management method of the command fifo. The 
command fifo starts at the address defined in the emdBaseAddr[01] register and occupies N 4k byte pages 
defined in the emdBaseSize register. The command fifo can be located either in AGP or frame buffer 
memory which is defined in the emdBaseSize register. CmdRdPtr points to the last executed entry in 
the command fifo. Amin is a pointer that walks through the fifo until it reaches an unwritten location. The 
rdptr can not access any entry beyond the amin pointer. The amax pointer is set to the furthest address 
location of a given write. The hole counter is basically the number of unwritten locations between the 
amax register and the amin register. When the hole counter is zero, the amin register is set to the value of 
the amax register, thus allowing the read pointer to advance to the new amin register value. The depth of 
the fifo is calculated by the difference between amax and rdptr. 


15.1. cmdBaseAddr0 


CMDFIFO 0’s base address pointer bits 23:0. CmdBaseAddr0 contains either the entire frame buffer 
address of the start of CMDFIFO, or contains the AGP address of the start of CMDFIFO. This register is 


read write, and has no default. 
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Bit Description 
24-bits of CMDFIFO address [23:0] in 4k byte pages. Default is 0x0. 


15.2.) cmdBaseSize0 


cmdBaseSize0 contains the size of the command fifo in bits 7:0 in 4k byte pages, starting from 4k. Bit 8 
enables or disables command fifo 0 operation. Bit 9 defines the location of command fifo 0, a value of 0 
locates the command fifo in frame buffer memory, and value of 1 locates the command fifo in AGP 
memory. Bit 10 disables the hole counter. 


Description 
Size of CmdFifo in 4k byte pages. (0=4k, 1 = 8k, etc...). Default is 0x0. 
CMDFIFO_0 enable (0=disable, 1=enable). Default is 0x0. 


Ps CMDFIFO_0 resides in AGP (0=frame buffer memory, 1=AGP memory). Default is 
0x0. 


Disable hole counter (O=enable, 1=disable). Default is 0x0. 


15.3. cmdBump0 


cmdBump0 defines the number of words to increment the amin pointer by, when managed by software. 
This register is write only and has no default. 


Bit 
Number of words to bump CMDFIFO 0’s write pointer. Default is 0x0. 


15.4 cmdRdPtrL0 


cmdRdPtrL0 contains the lower 32-bits of the read pointer. This register is read / write and allows 
software to monitor the progress of the CMDFIFO. This register is read write and has no default value. At 
initialization, this register should be set to cmdBaseAddr ,expanded to a byte address. 


Bit 
Lower 32-bits of the byte aligned CMDFIFO read pointer. Default is 0x0. 


15.5. cmdRdPtrH0 


cmdRdPtrH0 contains the upper 4-bits of the read pointer. This register is read write and has no default 
value. At initialization, this register should be set to cmdBaseAddr, expanded to a byte address. 


Bit 
Upper 4-bits of the CMDFIFO read pointer. Default is 0x0. 


15.6 cmdAMin0 


cmdAMin0 is a 25-bit register containing the min address register. CmdAMin register is updated with the 
cmdA Max register when hole count is zero. This register is read write and has no default value. At 
initialization this register should be set to cmdBaseAddr - 4. The value read back from this register is 4 
more than that written. 


Bit 


Byte Aligned Address Min register, bits 0 and | are ignored. Default is 0x0. 
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15.7 cmdAMax0 


cmdAMaxL40 is a 32-bit register containing the 25 bits of the max address register. CmdAMax register is 
automatically updated when an memory address greater to the existing cmdAMax register is written. At 
initialization, this register should be set to cmdBaseAddr - 4. The value read back from this register is 4 
more than that written. 


Bit Description 
Byte Aligned Address Max register, bits 0 and | are ignored. Default is 0x0. 


15.8 cmdStatus0 


cmdStatus0 is a 32 bit register that allows debuging and visability of the command fifo hardware. This 
register is read only, and has no impact on software development. 


Description 

AGP (packet 6) data xfer in progress 
Unpacker is busy 

Packet 6 decompress busy 

Host data transfers are complete 
JSR (fetch ctrl) active 

Executed depth = 0 

Prefetched depth = 0 

On chip fifo is empty 

Reserved. Default = 0 

JSR (unpacker) active 

Jump tag 

Jump command. 

Header Valid 

Local State (remaining entries in packet) 
Etended Packet Command 


15.9 cmdFifoDepth0 


cmdFifoDepth0 is a 20-bit register containing the current depth of CMDFIFO 0. Depth is the number of 
remaining unexecuted commands in off chip memory. The CMDFIFO is allowed to read upto, but not over 
the number of entries indicated by emdFifoDepth register. This register is read write and has no default 
value. 


Bit 
CMDFIFO 0 depth. Default is 0x0. 


15.10 cmdHoleCnt0 


cmdHoleCnt0 contains the number of unwritten locations between cmdAMin and emdAMax. 


Bit 
CMDFIFO Hole counter. Default is 0x0. 


15.11 cmdBaseAddr1 
cmdBaseAddrL1 is similar to cmdBaseAddr0, but controls CMDFIFO 1. 
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Bit Description 
Lower 23-bits of CMDFIFO address [31:0] in 4k pages. Default is 0x0. 


15.12 cmdBaseSizel 
cmdBaseSizel is similar to cmdBaseAddr0, but controls CMDFIFO 1. 


Description 
Size of CmdFifo in 4k byte pages. Default is 0x0. 
CMDFIFO 1 enable (0=disable, 1=enable). Default is 0x0. 


CMDFIFO_1 resides in AGP (0=frame buffer memory, 1=AGP memory). Default is 
0x0. 
Disable hole counter (O=enable, 1=disable). Default is 0x0. 


15.13. cmdBump1 


cmdBump1 is similar to cmdBump0. 


Bit 
Number of words to bump CMDFIFO 1’s write pointer. Default is 0x0. 


15.14 cmdRdPtrL1 
cmdRdPtrL1 is similar to cmdRdPtrL0. 


Bit 
Lower 32-bits of the CMDFIFO read pointer. Default is 0x0. 


15.15 cmdRdPtrH1 
cmdRdPtrH1 is similar to cmdRdPtrH1. 


Upper 4-bits of the CMDFIFO read pointer. Default is 0x0. 


15.16 cmdAMin1 
cmdAMin1 is similar to cmdAMin0 


Byte Aligned Address Min register for command stream 1. Default is 0x0. 


15.17 cmdAMax1 
cmdAMax1 is similar to cmdAMax0 


Byte Aligned Address Max register for command stream 1. Default is 0x0. 


15.18 cmdStatus1 


cmdStatus1 is identical to cmdStatus0, but is used for the second command fifo. 


15.19 cmdFifoDepth1 
cmdFifoDepth1 is similar to cmdFifoDepth0 
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Bit Description 
CMDFIFO | depth in dwords. . Default is 0x0. 


15.20 cmdHoleCnt1l 
cmdHoleCnt1 is similar to cmdHoleCnt0 


Bit Description 


CMDFIFO 1’s hole counter. Default is 0x0. 
15.21 cmdFifoThresh 


contain the fifo high water mark, when 
fifo freespace is above the water mark, then fill requests will be generated. When the high water mark is 
qualified, then new requests are generated. 


CMDFIFO 0 and 1’s fifo fetch threshold (low water mark), (Default value is 
0). 


CMDFIFO 0 and 1’s fifo high water mark (Default value is 7). 

jl | CMDFIFO 0 and 1's fifo fetch threshold (low water mark), msb bit (Default value is 0). _| 
21:12 
15.22 cmdHoleInt 


cmdHoleInt bits 21:0 contain the number of MCLK cycles a hole counter can have a hole before 
genreating an interrupt. The counter is only enabled when bit 22 of this register is set. This register should 
be used in combination with the IntrCtrl register to product PCI interrupts for flagging insufficient data. 


CMDFIFO 0 and | (holes !=0) time out value. Default is 0x0. 


CMDFIFO Time Out Counter Enable. (0=Disable, 1 = Enable). Default is 0x0. 


15.23 yuvBaseAddress 
yuvBaseAddress register contains the starting frame buffer location of the yuv aperture. 


Bit 
| 25:0 = ~———«|:: YUV base Address. Default is 0x0. 


15.24 yuvStride 
yuvStride register contains the destination stride value of the U and V planes. 


Y, U and V stride register. Default is 0x0. 


30:14 reserved 


Destination is tiled (0 = linear, | = tiled) . Default is 0x0. 
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16. _AGP/PCI Configuration Register Set 


[Vendor ID S303 dfx Interactive Vendor Identification 
Device Identification 

[Status S| PCdevice status 
[Revision ID [8 70 Revision Identification 
[Class code | 23:0 | Generic functional description of PCl device 
[Reserved 28-43 | Reserved 
[Reserved Ss 56-59 | Reserved 
[Interrupt line | 60 7:0 | Interrupt Mapping 


Min gnt Bus Master Minimum Grant Time 


Max lat Bus Master Maximum Latency Time 


FabID | r Identification 

Pp 
[31:0 | ACPIReset 
i | i hae) 
CfgStatus 

CfgScratch /3s0. : Scratch pad register 
AGP Cap_ID 
AGP status 
AGP_Cmd 
[96 
rere eee 
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16.1 Vendor_ID Register 


The Vendor_ID register is used to identify the manufacturer of the PCI device. This value is assigned by a 
central authority that will control issuance of the values. This register is read only. 


Bit Description 
3dfx Interactive Vendor Identification. Default is 0x121a. 


16.2 Device_ID Register 


The Device_ID register is used to identify the particular device for a given manufacturer. This register is 
read only. 


15:0 


16.3 Command Register 


The Command register is used to control basic PCI bus accesses. See the PCI specification for more 
information. Bit 0,1 and 5 are R/W, and bits 15:6 and 4:2 are read only. 


I/O Access Enable. Default is 0. 


Memory Access Enable (0O=no response to memory cycles). Default is 0. 
Master Enable. Default is 0. 


Special Cycle Recognition. Default is 0 


16.4 Status Register 


The Status register is used to monitor the status of PCI bus-related events. This register is read only and is 
hardwired to the value 0x0. 


Bit Description 
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Reserved. Default is 0x0. 
New Capabilities (AGP/ACPI). Default is 1 for AGP/ACPI (Strapped) 
66 Mhz Capable. Default is 0 for PCI 33 Mhz 1 for AGP (Strapped) 
UDF supported. Default is 0. 
Fast Back-toBack capable. Default is 0. (Strapped) 
Data Parity Reported. Default is 0. 


Device Select Timing. Default is 0x0. 

Signaled Target Abort. Default is 0. 

Received Target Abort. Default is 0. 

Received Master Abort. Default is 0. 

Signaled System Error. Default is 0. 

Detected Parity Error. Default is 0. Cleared by writing this register. This feature is used 
for detecting parity errors on bus transfers. 


16.5 Revision_ID Register 


The Revision_ID register is used to identify the revision number of the PCI device. This register is read 
only. 


Napalm Revision Identification. . ] 


16.6 Class_code Register 


The Class_code register is used to identify the generic functionality of the PCI device. See the PCI 
specification for more information. This register is read only. 


Bit 
23:0 


Description 
Class Code. Default is 0x3. 


16.7 Cache_line_size Register 


The Cache _line_size register specifies the system cache line size in doubleword increments. It must be 
implemented by devices capable of bus mastering. This register is read only and is hardwired to 0x0. 


Bit escription 
ache Line Size. Default is 0x0. 


— 
oO 
| 


16.8 Latency_timer Register 
The Latency_timer register specifies the latency of bus master timeouts. It must be implemented by 


devices capable of bus mastering. This register is read only and is hardwired to 0x0. 
Bit 
7:0 


Description 
Latency Timer. Default is 0x0. 


16.9 Header_type Register 


The Header_type register defines the format of the PCI base address registers (memBaseAddr in 
Napalm). Bits 0:6 are read only and hardwired to 0x0. Bit 7 of Header_type specifies Napalm as a single 
function PCI device. 


Bit Description 
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Header Type. Default is 0x0. 


Multiple-Function PCI device (0=single function, 1=multiple function). Default is 0x0. 


16.10 BIST Register 


The BIST register is implemented by those PCI devices that are capable of built-in self-test. Napalm does 
not provide this capability. This register is read only and is hardwired to 0x0. 


Bit Description 
BIST field and configuration. Default is 0x0. 


16.11 memBaseAddr0 Register 
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Description 
Memory Base #0 Address 


16.12 memBaseAddr1 Register 
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Bit Description 


Memory Base #1 Address 
16.13 ioBaseAddr Register 


The memBaseAddr register determines the base address for all PCI IO mapped accesses to Napalm. 
Writing Oxffffffff to this register will reset it to its default state. Once ioBaseAddr has been reset, it can be 
probed by software to determine the amount of io space required for Napalm. A subsequent write to 
ioBaseAddr will set the IO base address for all PCI memory accesses. See the PCI specification for more 
details on IO base address programming. 


FS_DATA Shasta emncmnreausoriny DANS 


| 1 (2b01) Oxffft fe01 (S12 bytes allocated) 
Earn Ou ROT GUAR snes) | 
Desmownt cfd spe dondolwng pacabecea] ale, 


Bit Description 
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IO Base Address 


16.14 subVendorID Register 


The subVendorID register defines the board manufacturer ID. During system initialization the expansion 
code located at romBaseAddr will set this register to the appropriate value. This register is read during 
plug and play initialization. See the PC97 specification for more details on subVendorID and plug and 
play requirements. The default value for this register is automaticaly loaded after reset from the ROM. 
Bits 7:0 are stored in ROM location 0x7ff8, while bits 15:8 are stored in 0x7ff9 for a 32K ROM. Bits 7:0 
are stored in ROM location Oxfff8, while bits 15:8 are stored in Oxfff9 for a 64K ROM. 


Bit Description 


Subsystem Vendor ID register. Initialized by expansion prom, default is read by ROM. 


16.15 subSystemID Register 


The subSystemID register defines the board type. During system initialization, the expansion code located 
at romBaseAddr will set this register to the appropriate value. This register is read during plug and play 
initialization. See the PC97 specification for more details on subSystemID and plug and play 
requirements. The default value for this register is automaticaly loaded after reset from the ROM. Bits 7:0 
are stored in ROM location 0x7ffa, while bits 15:8 are stored in 0x7ffb for a 32K ROM. Bits 7:0 are stored 
in ROM location Oxfffa, while bits 15:8 are stored in Oxfffb for a 64K ROM. 


Subsystem ID register. Initialized by expansion prom, default is read by ROM. 


16.16 romBaseAddr Register 


The romBaseAddr register determines the base address for all PCI ROM accesses to Napalm. Writing 
Oxfffffffe to this register will reset it to its default state. Once romBaseAddr has been reset, it can be 
probed by software to determine the amount of ROM space required for Napalm. A subsequent write to 
romBaseAddr will set the ROM base address for all PCI memory accesses. See the PCI specification for 
more details on memory base address programming. Napalm requires 32 to 64 Kbytes of address space for 
ROM accesses and is configured by strapping bit 2. For ROM accesses on the 32-bit PCI bus, the contents 
of romBaseAddr are compared with the pci_ad bits 31..16 (upper 16 bits) to determine if Napalm is being 
accessed. This register is R/W. 


Expansion Rom Base Address. Default is Oxffff8000 or Oxffff0000. 


16.17 Capabilities Pointer 


The Capabilities pointer register contains the offset in configuration space of beginning of the capability 
link list structure. This register is read only. 


31:0 Capabilities Pointer offset. Default is 0x00000054 if AGP is enabled via the strapping 
bits, otherwise it is 0x60. 


16.18 Interrupt_line Register 


The Interrupt_line register is used to map PCI interrupts to system interrupts. In a PC environment, for 
example, the values of 0 to 15 in this register correspond to IRQO-IRQ15 on the system board. The value 
Oxff indicates no connection. This register is R/W. 
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Bit Description 


Interrupt Line. Default is 0x5 IRQS) 


16.19 Interrupt_pin Register 


The Interrupt_pin register defines which of the four PCI interrupt request lines, INTA* - INTRD%, the 
PCI device is connected to. This register is read only and is hardwired to 0x1. 


Bit Description 
0:7 Interrupt Pin. Default is 0x1 (INTA*) 


16.20 Min_gnt Register 


The Min_gnt register specifies the burst period a PCI bus master requires. It must be implemented by 
devices capable of bus mastering. This register is read only and is hardwired to 0x0 since Napalm does not 
support bus mastering. 


Description 


Minimum Grant. Default is 0x0. 


16.21 Max_lat Register 


The Max_lat register specifies the maximum request frequency a PCI bus master requires. It must be 
implemented by devices capable of bus mastering. This register is read only and is hardwired to 0x0 since 
Napalm does not support bus mastering. 


16.22 fabID Register 


Identification code of the manufacturing plant. 


Bit Description 


Manufacturing fab identification. Read Only. 


16.23 ACPI Reset Register 
The ACPI Reset register returns the status of the internal aepi_reset signal. 


The efgInitEnable register is used to control miscellaneous configuration functions. 
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| | io BaseAddr bits(10:8). Default is 0x0. 

| 11 __| Address snoop enable (1=enable). Default is 0. 

Address snoop memBaseAddr0 enable (1=enable). Default is 0. 

Address snoop memBaseAddrl1 enable (1=enable). Default is 0. 


Address snoop Master/Slave (1=slave). Default is 0. 


24:15 Snoop Address #0. When Address snooping of memBaseAddr0 is enabled 
(efgInitEnable[11]=1 and cfgInitEnable[12]=1), the incoming PCI address high order 
bits are compared with efgInitEnable bits(24:15). The amount of address space snooped 
is controlled by efgPciDecode[13:10]. The PCI cycle is snooped if the address 


comparison passes and a memory read transaction is being performed. Note that 
memory writes are not snooped for memBaseAddr0. Default is 0x0. 
LO Swapbuffer algorithm (O=use vsync, 1=use sli_syncin/sli_syncout) 
[26—ti“‘( iti‘ ‘é‘*”d Swap Master (1=master — only used when cfgInitEnable[25]=1) 
Le Use “Quick” sampling algorithm on sli_syncin (1=enable — only used when 
cfgInitEnable[25]=1). 
[290 
(300 


2 
2 
2 
2 PCI multi-function device (sets bit(7) of the Header_type configuration register). 
2g 
3 
B 


Default is poweron strap value of FB DATA 19. 
Disable linear frame buffer read cache (in pci_Ifb_rd.v) (1=disable) 


Enable snooped writes to hardware initialization registers (1=enable) 
(BU Cid served 


16.25 cfgPciDecode Register 


The cfgPciDecode register is used to control the amount of memory decoded for the various memory bases 
in Napalm. 


1 
2 
3 
4 
5 
6 
di 
8 
| 
0 
1 


pci_membase0_decode. Default is poweron strap value of {FB DATA 3, FB DATA 2, 
FB DATA 1, FB DATA 0} 


3:0 

pci_membasel decode. Default is poweron strap value of {FB_DATA_7, FB DATA 6, 
FB DATA 5, FB DATA 4} 

[9:8 | pci_iobase_decode. Default is poweron strap value of {FB_DATA_9,FB_DATA 8} 

snoop_membasel_decode. Default is 0x0. 


27:18 Snoop Address #1. When Address snooping of memBaseAddr1 is enabled 
(cfgInitEnable[11]=1 and cfgInitEnable[13]=1), the incoming PCI address high order 
bits are compared with efgPciDecode bits(27:18). The amount of address space snooped 
is controlled by efgPciDecode[17:14]. The PCI cycle is snooped if the address 
comparison passes and a memory read or write transaction is being performed. Default 
is 0x0. 


31:28 


The amount of memory decoded for memBaseAddr0 and memBaseAddr1 (as controlled by 
cfgPciDecode[3:0] and cfgPciDecode[7:4] respectively), and the amount of memory snooped for 
memBaseAddr0 and memBaseAddr1 (as controlled by efgPciDecode[13:10] and cfgPciDecode[ 17:14] is 
as follows: 


pceiCfgDecode[xx:yy] Amount of address space decoded in particular 
address space 
128 MBytes 
1 256 MBytes 
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cfgVideoCtrl1 register (Default is 0x0). 
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31:24 sli_ comparemask crt 


cfgVideoCtrl2 register (Default is 0x0). 


Bit Description 


sli_rendermask_aafifo 


sli_comparemask_aafifo 


31:16 reserved 


The efgVideoCtrl0, cfgVideoCtrl1, cfgVideoCtrl2, and cfgVideoCtrl3 registers control the following 


external signals: 


Signal Name 


dac_vsync 


dac_hsyne 


TV control signals 
(tv_clk_out, 
tv_hsync, tv_vsync, 
tv_blank) 


tv_data[11:0] 


Digital AA signals 
(aa_vld, aa_clk, 
aa_data[11:0], 
vmi_addr[3:0], 
vmi_data[7:0], 
vmi_rw, vmi_ds_n, 
vmi_rdy) 


The dac_vsynce signal is always tristated when dac_vsync_float is set, regardless of 
all other settings. When dac_vsync_float is cleared and enhanced_video_en is set, 
then dac_vsync is driven if enhanced_video_slv is cleared (and tristated if 
enhanced video _slv is set). 
The dac_hsync signal is always tristated when dac_hsync_float is set, regardless of 
all other settings. When dac_hsync_float is cleared and enhanced_video_en is set, 
then the following equation is used to determine whether dac_hsync is driven: 
(((scanline[7:0] & sli_rendermask_crt) == sli_comparemask_crt) 
sli_crt_compare_invert) 
If enhanced_video_en is set and the above equation is true, then dac_hsync is 
driven, otherwise dac_hsync is tristated. 
If video_tv_output_en is set, then the TV control signals are driven, regardless of all 
other settings. If video_tv_output_en is cleared and enhanced_video_en is set, then 
the TV control signals are driven if enhanced_video_slv is cleared, otherwise if 
enhanced_video_en and enhanced_video_slv are both set then the TV control 
signals are tristated. 
If video_tv_output_en is set, then the tv_data signals are driven, regardless of all 
other settings. If video_tv_output_en is cleared and enhanced_video_en is set, then 
the following equation is used to determine whether the tv_data signals are driven: 
(((scanline[7:0] & sli_rendermask_crt) == sli_comparemask_crt) 
sli_crt_compare_invert) 
If enhanced_video_en is set and the above equation is true, then the tv_data signals 
are driven, otherwise the tv_data signals are tristated. 
If enhanced_video_en is set and enhanced_video_slv is set, then the following 
equation is used to determine whether the digital AA signals are driven: 
(((scanline[7:0] & sli_rendermask_aafifo) == sli_comparemask_aafifo) ’ 
sli_aafifo_compare_invert) 
If enhanced_video_en is set and enhanced_video_slv is set and the above equation 
is true, then the digital AA signals are driven, otherwise if enhanced_video_en is set 
and enhanced_video_slv is cleared the Digital AA signals are tristated. (Note that 
if enhanced_video_en is set then the VMI host bus signals are controlled by the 
above rules, otherwise if enhanced_video_en is cleared then the VMI host bus 
signals are controlled by the VMI control logic). *** Note that valid data is only 
transferred on the digital AA signals when enhanced_video_en is set and the above 
equation is true — otherwise aa_vld is tristated and no valid data is transferred 
across the digital AA bus. 


The cfgVideoCtrl0, cfgVideoCtrl1, and cfgVideoCtrl2 registers also control the following internal 


signals: 
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video_fifo_push When enhanced_video_en is set, then the following equation is used to 
determine whether video data is requested from the frame buffer and pushed 
onto the video fifo: 
(((scanline[7:0] & sli_rendermask_fetch) == 
sli_comparemask_fetch) ’ sli_fetch_compare_invert) 
If enhanced_video_en is set and the above equation is true, then video data is 
requested from the frame buffer and pushed onto the video fifo, otherwise no 
video data is requested and no data is pushed onto the video fifo. 


video_fifo_pop When enhanced_video_en is set, then the following equation is used to 
determine whether video data is popped off the video fifo: 
(((scanline[7:0] & sli_rendermask_fetch) == 
sli_comparemask_fetch) “ sli_fetch_compare_invert) 
If enhanced_video_en is set and the above equation is true, then video data is 
popped off the video fifo during display refresh. *** Note that if 
enhanced_video_en is set and the above equation is not true that no data is 
popped off the video fifo (implying that the data to be presented to the 
monitor will be received from the AA fifo). 
aafifo_pop When enhanced_video_en is set, then the following equation is used to 
control the aafifo_pop signal: 
(((scanline[7:0] & sli_rendermask_aafifo) == 
sli_comparemask_aafifo) “ sli_aafifo_compare_invert) 
If enhanced_video_en is set and the above equation is true, then aafifo_pop 
is asserted, otherwise aafifo_pop is deasserted. *** Note that if the AA fifo 
is empty, that aafifo_pop is deasserted regardless of the result of the equation 
above (i.e. a pop will never be generated which would underflow the fifo). 
DAC_blank The DAC_blank signal is asserted during the active video region for a 
particular scanline when enhanced_video_en is set and the following 
equation is false: 
(((scanline[7:0] & sli_rendermask_crt) == sli_comparemask_crt) ’ 
sli_crt_compare_invert) 
If enhanced_video_en is set and the above equation is true, then the 
DAC blank signal is deasserted during the active video region of a particular 
scanline. 


localmux (MUX #1) select | When video_localmux_sel is set, then the output of the video localmux 
(MUX #1) will be the sum of the desktop and overlay surfaces. If 
video_localmux_sel is cleared, then the output of the video localmux will be 
the result of the standard chroma-key comparison. Note that 
enhanced_video_en has no effect on the output of the video localmux (MUX 
#1). 

othermux (MUX #2) select | When enhanced_video_en is set, then the following equation is used to 

[1:0] control the video othermux (MUX #2) select signals (2 bits): 

(video_othermux_sel[1:0]) (((scanline[7:0] & sli_rendermask_aafifo) == 

sli_comparemask_aafifo) ’ sli_aafifo_compare_invert) 

If enhanced_video_en is set and the above equation is true, then the video 
othermux select signal is video_othermux_sel_true[1:0].\f 
enhanced_video_en 1s set and the above equation is false, then the video 
localmux select signal is video_othermux_sel_false[1:0]. If 
enhanced_video_en is cleared, then the output of the video othermux (MUX 
#2) is set to be the output of the video localmux (MUX #1). 
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16.27 cfgSliLfbCtrl Register 


The efgSliLfbCtrl register is used to control the functionality of linear frame buffer accesses when 
scanline interleaving is enabled. 


Description 

sli_Ifb_renderMask (unsigned value) 
sli_Ifb_compareMask (unsigned value) 
26 
pee 
28 


sli_Ifb_numchips_log?2 [# chips in SLI (log2)] 

LJ sli_Ifb_cpu_wr_en (1=enable) [drop scanlines in memBaseAddr1 SLI/AA tiled space 
not owned for direct cpu Ifb writes] 
sli_lfb_dptch_wr_en (\=enable) [drop scanlines in memBaseAddr1 SLI/AA tiled space 
not owned for dispatched lfb writes — dispatched Ifb writes are those which are generated 
from within a command fifo packet] 


sli_lfb_rd_en (1=enable) [for a given chip, return only scanlines owned for Ifb reads 
accessing the SLI/AA tiled memory region in memBaseAddr1 | 


31:28 


16.28 cfgAaDepthBufferAperture Register 


The cfgAaDepthBufferAperture register is used to define the aperture within the memBaseAddr1 
address space which defines the depth buffer location. efgAaDepthBufferAperture is used by the linear 
frame buffer read module for AA reads so as to ensure that reads from the depth buffer do not get averaged 
together to form an anti-aliased result. Instead, reads from the depth buffer do not get averaged and only 
the values from the primary surface are returned (i.e. only one sub-sample is returned). 


Description 
aa_depth_buffer_begin (bits[26:12], specified in 4K pages) 
15 


piss eserved 
31:16 aa depth buffer_end (bits[27:12], specified in 4K pages) 


When in a multi-chip configuration using scanline interleaving, s/i_/fb_cpu_wr_en is set to cause direct cpu 
writes in the SLI/AA tiled aperture of memBaseAddr1 to be dropped if a particular scanline is not owned 
by a given chip. An incoming 27-bit address is within the SLI/AA tiled aperture of memBaseAdadr1 if the 
address is greater than or equal to the tiled aperature beginning (controlled by the IfbMemoryTileCtrl and 
IfbMemoryTileCompare registers). Similarly, s/i_Ifb_dptch_wr_en is set to cause dispatched writes 
(writes within the memBaseAddr1 address space generated by command fifo packets) to be dropped if a 
particular scanline is not owned by a given chip. 


23206 sli_Ifb_scanMask (unsigned value) 


sli_Ifb_rd_en is set to cause a given chip to return data from a linear frame buffer read within the SLI/AA 
tiled aperture of memBaseAddr1. This bit must be set for normal SLI operation. 


For both reads and writes to the SLI/AA tiled aperture of memBaseAddr1, the calculated Y value is 
modified prior to physical address calculation in order to pack scanline “bands” into a particular chips’ 
memory as follows (exactly the same formula used for rendered as defined in the 3D register sliCtrl): 


N = log2(# scanlines rendered by each chip, valid values {1,2,4,8,16,32,64,128}) [0 <=N < 8] 
M = log2(# chips in SLI configuration, valid values {2,4,8}) [0<M<4] 

ChipID = unique value identifying each chip in an SLI configuration (range 0-7 inclusive) 
sli_Ifb_renderMask = [(# chips in SLI configuration) — 1] <<N 
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sli_lfb_compareMask = chipID << N 
sli_Ifb_scanMask = 2\ — 1 
sli_lfb_numchips_log2 =M 


Note that a combination of M and N must be selected such that s/i_/fb_renderMask, sli_Ifb_compareMask, 
and s/i_[fb_scanMask are 8-bit quantities (i.e. less than 256). 


When either s/i_[fb_cpu_wr_en or sli_Ifb_dptch_wr_en are set, a linear frame buffer access within the SLI/ 
AA tiled aperture of memBaseAddr1 is written when (y & sli_lfb_renderMask) == sli_Ifb_compareMask. 
The Y value used for physical address calculation is modified when either s/i_/fb_cpu_wr_en or 
sli_lfb_dptch_wr_en are set as follows to reduce the amount of frame buffer memory required for a given 
chip: 


y =[(y >> M) & ~sli_Ifb_scanMask] + [y & sli_Ifb_scanMask| 


16.29 cfgAaLfbCtrl Register 


The efgAaLfbCtrl register is used to control the functionality of linear frame buffer accesses when anti- 
aliasing interleaving is enabled. 


Description 


aa_Ifb_cpu_wr_en (1=enable) [broadcast direct cpu Ifb writes within the SLI/AA tiled 
aperture of memBaseAddr1 to both AA rendering surfaces] 

aa_Ifb_dptch_wr_en (1=enable) [broadcast dispatched Ifb writes within the SLI/AA tiled 
aperture of memBaseAddr1 to both AA rendering surfaces -- dispatched lfb writes are 
those which are generated from within a command fifo packet] 

aa_Ifb_rd_en (\=enable) [Enable anti-aliased Ifb reads accessing the SLI/AA tiled 
memory aperture of memBaseAddr1] 

aa Ifb rd format (0=16bpp, 1=15bpp, 2=32bpp, 3=reserved) 

aa_Ifb_rd_divide_by_four (O=divide by 2, 1=divide by 4) 


aa_Ifb_cpu_wr_en is set to cause direct cpu writes in the SLI/AA tiled aperture of memBaseAddr1 to be 
broadcast to both AA sets of rendering buffers (where the location of the second set of rendering buffers is 
defined in the Secondary colBufferAddr and Second auxBufferAddr registers). An incoming 27-bit 
address is within the SLI/AA tiled aperture of memBaseAddr1 if the address is greater than or equal to the 
tiled aperature beginning (controlled by the lfhMemoryTileCtrl and IfbMemoryTileCompare registers). 
Similarly, aa_/fb_dptch_wr_en is set to cause dispatched writes (writes within the memBaseAddr1 
address space generated by command fifo packets) to be broadcast to both AA sets of rendering buffers. 


aa_Ifb_rd_en is set to cause all 4 subsample buffers which compose a particular pixel’s value to be 
averaged together for CPU reads in the SLI/AA tiled aperture of memBaseAddr1. This bit should be set 
for normal AA operation. 


16.30 cfgSliAaMisc Register 


The cfgSliAaMise register is used to control miscellaneous configuration functions. 


18:0 | vga _vga_vsync_offset. Default is 0x0. 


hotplug signal control (0,1=tristate, 2=drive low, 3=drive high). Default is 0x0. The 
hotplug signal needs to be an input when used for LCD hotplug functionality, and thus 
cfgSliAaMisc bits(10:9) should be 0x0 when this functionality is desired. 
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hotplug pin value (input value of the hotplug pin). Default is 0. 


12 aa_Ifb_rd_slv_wait (1=wait for aa data to be transferred before transfering own data). 
Default is 0. Used for greater than 2-chip multi-chip configurations 


31:13 reserved 


The vga_vga_vsync_offset field (efgSliAaMisc[8:0]) is programmed as follows (copied from Rich 
Goodin’s email): 


The following is the description of how to program the three fields of vga_vsync_offset: 


vga_vsync_offset is a 9 bit field subdivided into three 3 bit subfields. vga_vsync_offset[2:0] is 
pixel offset and shifts the synchronization point in pixel increments. vga_vsync_offset[5:3] is 
character offset and shifts the synchronization point in character (8 pixel) increments. 
vga_vsync_offset[8:6] is a bit more difficult to explain as it is a preset for the internal 

horiz_xtra signal. The present value can range for 0-5 and I'll explain how it is determined below. 


The VGA horizontal timing is based on the value of a horizontal counter. The width of the scan 
line is specified by setting the maximum horizontal count by writing the horizontal total register. 
But, as with all things VGA, the scanline length is not simply horizontal total character times, but 
horizontal total + 5 character times. Internally, the VGA counts to horizontal total and then counts 
horizontal xtra for 5 more character times. 


The vsync_ref signal is passed through a "debounce" network which introduces a delay of 
approximately 16 character clock cycles. The goal of loading the vga_vsync_offset values is to 
preset the slave VGA CRTC to the correct state the delay time after the master's vsync start. In the 
case of the delay being exactly 16 pixel clocks the following settings would sync up the master 
and 

slave exactly. If the desire is to actually cause the slave to run ahead of the master then increase 
the values, if the desire is to run after, decrease the values. 


For the 16 bit delay, this breaks down to 2 characters and not pixels or vga_vsync_offset[2:0] is set 
to 0 and vga_vsync_offset[5:3] is set to 2. vga_vsync_offset[8:6] is set depending on the setting of 
hsync_start. If the programmed delay value is set up to occur prior to horizontal total, the 

value is set to 0. If the delay value is past horizontal total then the programmed value is set as 
follows: 


If hsync_start + character delay <= horizontal total, vga_vsync_offset[8:6] = 0. 
If hsync_start + character delay == horizontal total+1, vga_vsync_offset[8:6] = 
1: 

And so on to the maximum value of 5. 


*** Note that Napalm has a bug which makes the value vga_vsync_offset[2:0]=7 behave as if 
vga_vsync_offset[2:0]=15, so vga_vsync_offset[5:3] must be adjusted accordingly. 


16.31 cfgStatus Register 

The cfgStatus register is an alias to the normal memory-mapped status register. See section x.x for a 
description of the status registers. Reading the configuration space efgStatus register returns the same 
data as if reading from the memory-mapped status register. 
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16.32 cfgScratch Register 


The cfgScratch register can be used as scratch pad storage space by software. The values of cfgScratch 
Big not used internal ysig-AltsuNctionality, so any value can be stored to and read from efgScratch. 


Scratchpad register. Default is 0x0. 


16.33 New capabilities (AGP and ACPI) 

AGP and ACPI Use PCI’s new capabilities mechanism. The New Capabilities structure is implemented as 
a linked list of registers containing information for each function supported by Napalm. The list contains 
both AGP status and command registers. AGP registers read back ‘0’ if AGP is disabled via the strapping 
pins. 


16.34 Capability Identifier Register 
The capability register resides at offset (CAP OFFSET). This register identifies AGP revision compliance 


Description 


Reserved. Defined as 0. 


16.35 AGP Status 


AGP status register documents maximum number of requests that Napalm can manage, AGP sideband 
capable, and transfer rate 


Data rates that Napalm can deliver/receive. Bit[0] = 1x, bit[1] = 2x, bit[2] = 4x. Default 
is 3. 
AGP_4G. AGP supports above 4 Giga bytes of memory. Default is 1. 
[8:6 | Reserved. DefaultisO. 
[9 | SBA. Device supports side band addressing 


16.36 AGP Command 


AGP status register documents maximum number of requests that Napalm can manage, AGP sideband 
capable, and transfer rate 


AGP enable. Enables AGP function. AGP RESET sets this bit to 0. (R/W) 
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19 ~————s«| SBA_ENABLE. Enable side band addressing mechanism. (R/W 


23:10 Reserved. Default is 0 
RQ DEPTH. Max # of requests System can handle. (R/W) 


31:24 
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16.37 ACPI Cap ID 
The ACPI Cap ID register identifies what Napalm supports in ACPI. 


Description 
Capability ID. Always == | for ACPI 
Next Capability ID Pointer. Default is 0 


18:16 Version. Default is 0x1. 


PME Clock. Default is 0. 
Aux Power Source. Default is 0. 

DSI. Default is 1. Indicates additional software initialization must take 

Reserved. Default is 0 


31:27 PME Support. Default is 0. 


16.38 ACPI Ctrl/Status 


ACPI status register allows transition from the D3 to DO state. 


1:0 Power State. Defaults to 0x0. Napalm only accepts writes of 0x0 or 0x3 to these bits. 
(R/W 


Reserved. Default is 0 

ie Sticky bit. Default is 0 
15 
22 
23 


Data Select. Default is 0 
14:13 Data Scale. Default is 0 


[15 Sticky bit. DefaultisO 
[22 S* B2 B3 support. DefaultisO, 
[23 | BPCC_En. DefaultisO. 
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7. Init Registers 


Register Name VO | Bits Description 
Add 
ress 


Stays —_7 00-0830 1 R_{_ Napali status reise 
peilnitd pein foc 0 L 
I: Starting Tile page and stride register 
24-27 | 31: 0 R/W | Texture Cache jnitialontion, and tv and aa-data 
0 | R/W | VGA initialization register 


2d_command 30-33 | 31:0 | W 2d command register (to be used to write 
SGRAM mode and special mode registers) 

2d_srcBaseAddr 34-37 | 31:0 | W 2d srcBaseAddr register (to be used to write to 
SGRAM mode and | mode registers) 

eset 48- aS -qotnc [en Impedance Matching register. 


17.1 status Register (0x0) 

The status register provides a way for the CPU to interrogate the graphics processor about its current state 
and FIFO availability. The status register is read only, but writing to status clears any Napalm generated 
PCI interrupts. 


16s Vertical retrace So retrace active, 1=Vertical retrace inactive). Default is 1. 
is..ttiszdzY (O=engine idle, 1=engine is 7 Default is 0. 
Napalm busy (0=idle, 1=busy). Default is 0. 


10 2D busy (0=idle, 1=busy). Default is 0. 


Cmd fifo 0 busy. Default is 0x0. 
ie Cmd fifo 1 busy. Default is 0x0. 


27:15 
30:28 Swap Buffers Pending. Default is 0x0. 
PCI Interrupt Generated. Default is 0x0. 


Bits(4:0) show the number of entries available in the internal host FIFO. The internal host FIFO is 32 
entries deep. The FIFO is empty when bits(4:0)=0x1f. Bit(6) is the state of the monitor vertical retrace 
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signal, and is used to determine when the monitor is being refreshed. Bit(7) of status is used to determine 
if the graphics engine of FBI is active. Note that bit(7) only determines if the graphics engine of FBI is 
busy — it does not include information as to the status of the internal PCI FIFOs. Bit(8) of status is used to 
determine if TREX is busy. Note that bit(8) of status is set if any unit in TREX is not idle — this includes 
the graphics engine and all internal TREX FIFOs. Bit(9) of status determines if all units in the Napalm 
system (including graphics engines, FIFOs, etc.) are idle. Bit(9) is set when any internal unit in Napalm is 
active (e.g. graphics is being rendered or any FIFO is not empty). Bit(10) of status is used to determine if 
the 2D graphics engine is active. Bits(1 1:10) of status is used to determine if either command fifo 0 or 
command fifo 1 are active. When a SWAPBUFFER command is received from the host cpu, bits (30:28) 
are incremented — when a SWAPBUFFER command completes, bits (30:28) are decremented. Bit(31) of 
status is used to monitor the status of the PCI interrupt signal. If Napalm generates a vertical retrace 
interrupt (as defined in pcilnterrupt), bit(31) is set and the PCI interrupt signal line is activated to generate 
a hardware interrupt. 


17.2 pcilnit0 Register (0x4) 
pcilnit0 register contains the control information on how PCI should behave. Bits 15:0 are the output of 
the counter clocked by GRX clock. Bits 19:18 control Interrupts. Bits 17:13 allow the retry interval to be 
increased, while bits 12 and 11 allow retries to be disabled. Bits 9 and 8 determine the bus performance. 
Bits 6:2 
determine the PCI fifo Low water mark. This value should never be 0 (no overflow checking is done) and 
should be set greater than 2 for any fast device operations. Bits 25:20 control how many non modal LFB 
accesses are grouped together before being pushed to memory. This register is read write and defaults to 
0x01800040. 


Bits 28:27 control the adjustable timeout for PCI writes travelling into the frame buffer. After the indicated 
number of clocks, writes in the PCI fifo will get flushed out to the frame buffer. By increasing this timeout, 
LFB/Command traffic can be made more efficient because the timeout will force the hardware to wait for 
more entries in the fifo to accumulate before writing anything to the frame buffer. By emptying the fifo in 
larger and less-frequent groups, overall Napalm memory efficiency can increase. The downside of 
increasing this threshold is potentially more latency between the time when a PCI write is received, and the 
time that it actually enters the frame buffer. In general, this increase in latency does not cause hardware 
slowdown. 


Bit 


PCI FIFO Empty Entries Low Water Mark. Valid values are 0-31. Default is 0x10. 
as Reserved. Default is 0x1. 


i are Cai so a, Wait state cycles for PCI read accesses (O=1 ws, 1=2 ws). Default is 0x0. 
ae Wait state cycles for PCI write accesses (O=no ws, 1=one ws). Default is 0x0. 
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PCI Fifo read threshold. Default is 0x18. 
Force PCI/CMD Frame buffer accesses to high priority. (1= high, 0 = low, except for 


PCI frame buffer reads). . Default is 0x0. 
PCI fifo to LFB write timeout in MCLKs (0=64, 1=128, 2= 196, 3=256). 


17.3. sipMonitor Register (0x8) 


sipMonitor register contains the silicon performance counters used to measure silicon performance by 
clocking a counter by a ring oscillator of NOR’s and NAND’s and comparing the value to a counter based 
on GRX clock. The larger the process counter, the faster the process. Bits 15:0 are the output of the 
counter clocked by GRX clock. Bits 27:16 is the counter clocked by the ring oscillator. Bit 28 clears the 
ring oscillator counter to zero. Bit 29 selects either a nand chain or a nor chain so one can measure the P 
transistor strength or the N transistor strength. Bit 30 enables the monitor. This register is read write and 
defaults to 0x40000000. 


17.4 IfbMemoryConfig Register (0xC) 


IfbMemoryConfig 


Bit Description 


}28:0 mused 
| 
(300 | 
[eM erence erence nace 


Copyright © 1996-1999 3dfx Interactive, Inc. Revision 1.13 
3dfx Confidential 215 Printed 
10/24/2019 


For Internal Use Only 


SOT spat Grapes engine 


IfbMemoryTileCtrl 
Description 


Tile aperture stride in bytes (O=1k, 1=2k, 2=4k, 3=8k, 4 = 16k). Default is 0x1. 
Number of sgram tiles in X. Default is Oxa. 


IfbMemoryTileCompare 


Revision 1.13 
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17.5 miscInit0 Register (0x10) 


miscInit0 contains resets to all subsystems, pixel swizzling, and Y origin subtraction. Bits [1:0] reset the 
3D graphics subsytem. Bits [3:2] enable byte/word swizzling during register accesses to 2D or 3D. Bits 
[6:4] define resets for video, 2D, and memory subsytems. Bits[29:18] define the Y origin subtraction value 
used during address calculation when Y flip is enabled in fozMode. Bits [31:30] enable byte/word 
swizzling during non modal LFB reads and writes 


Description 

Miscellaneous Control 

FBI Graphics Reset (O=run, 1=reset). Default is 0. 

FBI FIFO Reset (0=run, 1=reset). Default is 0. [resets PCI FIFO and the PCI data 
packer] 


| nn 
VGA Video Timing Reset (0=run, 1=reset). Default is 0. 
Programmable delay to be added to the blank signal before it outputs to the TV out 
interface. This is in terms of number of flops clocked by the 2x clock. The objective is to 
synchronize the blank signal with the data output by matching the CLUT delay. Default 
is 0x0. 
000 = 2 flops; 001 = 3 flops; 


16 

13:11 Programmable delay to be added to the vsync and hsync signals before they are output to 

the TV out interface. This is in terms of number of flops clocked by the 2x clock. The 

objective is to synchronize the sync signals with the data output by matching the CLUT 

delay. Default is 0x0. 

000 = 2 flops; 001 = 3 flops; 111 =9 flops 

16:14 Programmable delay to be added to the vsync and hsync signals before they are output to 

the monitor. This is in terms of number of flops clocked by the 2x clock. The objective is 

to synchronize the sync signals with the data output by matching the delay through the 

CLUT and DAC. Default is 0x0. 

000 = 2 flops; 001 = 3 flops; 111 =9 flops 

| | ¥ Origin Definitionbits 
29:18 Y Origin Swap subtraction value (12 bits). Default is 
0x0. 
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17.6 miscInit1 Register (0x14 when miscInit0[30]=0) 


miscInit1 register controls miscellaneous operations of Napalm available in real mode. Bit 0 is used to 
correct for CLUT addresses being inverted during host accesses. This bit should be set to 1 for proper 
operation. Bit 3 enables and disables writes to the PCI subVendorID and subSystemID registers. Bit 4 
enables writes to the ROM through romBaseAddr. Bit 5 enables the new triangle address aliasing 
allowing better address compaction. Bit 6 disables texture mapping. 


Power down of Napalm is controlled by bits 11:7, where bit 7 powers down the color lookup tables, bit 8 
powers down the DAC itself, bits 9, 10, and 11 power down the three PLL’s. 


Bits 17 and 18 disable stalling on the opposite pipe (either 2D or 3D) when a command is sent down. These 
bits are used for testing, and should not be set during normal operation. 


Bit 19 is used to terminate command fifo activity. Setting this bit to ‘1’ halts the command fifo and resets 
all of the registers in the command register space to their default values. In order for Napalm to be shut 
down gracefully, this bit should only be set when Napalm is idle. Be sure to restore this bit to 0 when 
finished. 


Bits 28 through 24 indicate the value of the strapping registers at boot up. Note that altering these bit effect 
the read back information of PCI and AGP resource reporting. For more information on the strapping 
registers, see the section on Power on Strapping. 
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Description 

Miscellaneous Control 

invert _clut_address. Default = 0. 

tri_mode — triangle iterator mode. Default is 0x0. 

Enable Sub Vendor/ Subsystem ID writes. (O=disable, 1=enable). Default is 0x0. 
Enable ROM writes. (O=disable, 1=enable). Default is 0x0. 

Alternate triangle addressing map (0=disable, 1=enable). Default is 0x0. 
Disable texture mapping (O=enable, 1=disable). Default is 0x0. 

Power Down Control 

Power Down CLUT. Default is 0x0. 

Power Down DAC. Default is 0x0. 

Power Down Video PLL. Default is 0x0. 


Disable 3D stall on 2D synchronous dispatch. When set to 1, 3D will not wait on 
pending 2D operations to complete before being issued. Default is 0x0. 


[ eC aRR 


| pd pin value (input value of the pd pin). DefaultisO. 
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17.7 vip2vmiCtrl Register (0x14 when miscInit0[30]=1) 


vip2vmiCtrl controls the VIP-to-VMI translation module in Napalm. The default value is 0x80001. Note 
that vip2vmiCtrl is accessed by setting miscInit0[30] and accessing the miscInit1 register. When 
miscInit0[30]=1, then accesses to miscInit1 register will instead be directed to vip2vmiCtrl. 


Description 

VIP-to-VMI disable (O=enable VIP-to-VMI translation, 1=passthru VMI signals 
unmodified). Default is 1. 

vbi_int (1=enable interrupt at end of VBI lines). Default is 0. 


vbi_crop (1=enable VBI cropping). Default is 0. 


reserved 
4 vbi_max[4:0]. Default is 0. 
vid_max[9:0] 
Reset vip2vmi module (1=reset module). Default is 1. 


When vbi_int (vip2vmiCtrl[1]) is set, it enables an interrupt at the end of the VBI lines. A rising edge on 
vmi_intr will be generated at earliest of either the EAV of the first line without the “V” bit set in the SAV 
code or the EAV of that last line as defined by the VBI MAX counter. (Note the V and F bits is 
EAV/SAV only change at EAV. The EAV has the codes for the next line.) This function is independent 
of the crop bit below. It is possible to have the VBI interrupt enabled and crop the VBI data over the port. 


When vbi_crop (vip2vmiCtrl[2]) is set, vymi_vact will not be active for any lines with “V” set its 
EAV/SAV code. This crops the VBI data. 


The vbi_max field (vip2vmiCtr1[8:4]) contains the number of lines with the “V” bit set in EAV/SAV that 
will have VMI_VACT active during its valid pixel period. Any additional lines will be cropped. 


The vid_max field (vip2vmiCtrl[18:9] contains the number of lines with the “V” bit clear in EAV/SAV that 
will have VMI_VACT active during its valid pixel period. Any additional lines will be cropped. 


vip2vmiCtr] bits[3 1:30] have different definitions when they are written as opposed to when they are 
being read. When read, vip2vmi_intr_field (vip2vmiCtrl[30]) is the state of the “F” bit in the last EAV 
code before VMI _INTR went active. Also when read, vip2vmi_intr_type (vip2vmiCtrl[3 1 ]) is set if the 
last VML_INTR was for VBI data and cleared if the last VMI_INTR was for non VBI data. When written, 
vip2vmiCtrl[30] is a reserved bit and vip2vmiCtr1[31] should be set for VIP-to-VMI translation. 


Note that vip2vmiCtrl[0] must be cleared, vip2vmiCtrl[19] must be cleared, and vip2vmiCtr1[31] must 
be set in order for proper VIP-to-VMI translation to occur. When VMI video port is desired, 
vip2vmiCtrl[0] should be set, vip2vmiCtrl[19] should be set, and vip2vmiCtrl[3 1] should be cleared (this 
is the poweron default state). 


17.8 dramInit0 Register (0x18) 


dramInit0 controls the sgram interface timing of specific timing parameters. The default value of this 
register is 0x00579d29. 


Bit 
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i. 1) Sgram access timing 


tRRD — row active to row active (1-4 clks). Default is 0x1 (2 clks) 

tRCD — RAS to CAS delay (1-4 clks). Default is 0x2 (3 clks). 

tRP — row precharge (1-4 clks). Default is 0x2 (3 clks). 
tRAS — minimum active time (1-16 clks). Default is 0x4 (5 clks). 

tRC — minimum row cycle time (1-16 clks). Default is 0x7 (8 clks). 

tCAS latency (1-4 clks). Default is 0x2 (3 clks). 
tMRS mode and special mode register cycle time (1-2 clks).. Default is Ox1 (2 clks) 
tDQR Rd to DQM assertion delay (0-1 clks). Default is 0x1 (1 clk) 


tBWC Block write cycle time (1-2 clks). Default is 0x1 (2 clks) 
2 


21:20 tBWL BKWR to Pre (1-4 clks). Default is Ox1 (2 clks) 


2 
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17.9 dramInit1 Register (0x1C) 


Description 

SGRAM Refresh Control 

Refresh Enable (0=disable, 1=enable). Default is 0. 

Refresh_load Value. (Internal 14-bit counter 5 LSBs are 0x0) Default is 0x100. 
Video Refresh Control 


SGRAM read data sampling control 


sg clk nodelay — bypass the delay element. Default = 1. 
sg_use_inv_sample — resample the flopped sgram data with another negative-edge flop 
before flopping data with mclk. Default = 0. 


sg_del_clk invert — invert delayed clock before using it. Default = 0. 


19:16 sg_clk_adj — delay value for sgram read data sample clock. Default = 0x0. (2-62 NAND 
gates of delay, in steps of 4) 


Po SGRAM frame buffer output delay control (control + data bits) 


sg oflop del_adj — delay value for mclk (2-62 nand gates of delay, in steps of 4) to 
transparent latch that sends fb_* off chip. Default = Oxf. 
sg oflop trans latch — forces latch for fb_* signals to be transparent. (O=use delayed 
mcelk to latch, 1=make latch always transparent). Default = 0. 
| | Memory Controller configuration bits 
metl short power_on. Power on in 128 cycles. Default=0.VMI ADDR 1 


metl_no_vin_ locking — prevent vin from locking the bus during requesting. 0=allow 
locking, 1=prevent locking. Default = 0. 

Rev. BO and after: mctl_type_sdram — (0O=use SGRAMs, I=use SDRAMs). Default = 0. 
VMI ADDR 2 


When using SGRAMs, mctl_type_sdram should be set to 0. When using SDRAMs, only 16Mbit 
(16x512K parts) are supported, which result in a 16MB frame buffer. The sgram_type and sgram_chipsets 
bits in dramInit0 are ignored when mctl_type_sdram=1. 


Rev. BO and after: Note that the fastfillCMD behaves differently when mctl_type_sdram=1 
(dramInit1[30]). When fastfilling with SGRAMs (mctl_type_sdram=0), if dithering is enabled and 
fastfillCMD/[0]=1, no dithering will happen. But when fastfilling with SDRAMs (mctl_type_sdram=1), if 
dithering is enabled and fastfillCMD/[0]=1, dithering will still happen, since SDRAMs do not support 
blockwriting. 
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17.10 agpInit0 Register (0x20) 


The agpInit0 register is used to control how AGP behaves when making requests. Bit 0 sets the request 
priority level. Bits [3:1] are now reserved. Bits [6:4] determine when the agp request fifo becomes full 
(requests that have not yet been issued to the AGP target). Bits [10:7] control when to much data has been 
returned, and AGP needs to begin stalling. 


Description 
Force AGP request to be high priority. (Q=Low, | = High). Default is 0x0. 


AGP request fifo full threshold. Default is 0x1. 
AGP read fifo full threshold. Default is 0x9. 


Reserved. Defaults is 0x0. 


17.11 tmuGbelnit Register (0x24) 
tmuGbelnit register contains the fifo water marks for both the TMU and FBI sections. The default value 


of this waaeE is OxObfb. Bits [19:15] control the wah ae a NAND chain) of the TV-Out output clock. 


Texture read request low water mark - fifo freespace level that TMU will empty the read 
request queue to before stopping a sequence. Default is Oxb. 


txc_disable_rdrq_max — disable the limit of 16 as the max limit of max number of reads 
in a row by the texture cache interface to the memory controller. Default = 0. 
txc_use_min_req - sets the minimum number of reads done by the texture cache to 3. 
Default = 0. 


ee | 
, 
17.12 vgalInit0 Register (0x28) 


The vgaInit0 register is used for hardware initialization and configuration of the VGA controller in 
Napalm. VGA can be disabled by writing bit 0 to a “1”. Bit 1 allows external video timing to drive the 
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VGA core video scan out logic. Bit 2 controls how the VGA DAC control logic views the width of the 
RAM. For VGA compatibility, this bit should be set to 0 (6 bit DAC). 


VGA extensions are enabled by bit 6. These extensions are mention in the VGA portion of this spec, in the 
CRTC register space. Bit 10 enables the ability to read back the PCI configuration when bit 6 of this 
register is 0. 


Bit 8 determines if the chips should wake up as a VGA motherboard or an add in card. Bit 9 disables the 
VGA to response to legacy address decoding. This bit should be set if Napalm is not the primary display 
adapter in the system. Setting this bit also disables write access to 0x46e8 and 0x102. 


Bit 12 should be set when in an extended (non-VGA) mode. This disables the VGA from fetching memory 
data during video raster scan out. 


Bit 13 is used when an external DAC is supported. This bit should always be set to 0. 


By default, 
VGA is placed at the beginning of memory. If need be, it can be moved anywhere on a 64K byte boundary 
within 64M bytes. 


Bit 22 disables VGA refresh control of board memory. When VGA is in scan out mode, it prefers memory 
refresh to happen at horizontal sync time. When this bit is set to 0, three memory refresh cycles happen 
after HBLANK occurs, and the memory refresh time out counter is deferred. When this bit is set to 1, the 
memory refresh time out counter explicitly controls memory refresh events. 
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Description 
Miscellaneous Control 


VGA disable. (0O=Enable, 1 = Disable). Setting this bit to 1 shuts off all 
access to the VGA core. Sissy iaINE 

Use external video timing. This bit is used to retrieve SYNC 
information through the normal VGA mechanism when the VGA 
CRTC is not providing timing control. 


VGA 6/8 bit CLUT. (0= 6 bit, 1 = 8 bit). 


Enable VGA Extensions 


0x46e8/0x3C3 Wake up select (O=use 0x46e8, 1=use 0x3C3 or IO 
Base + 0xC3). VGA add in cards that use 0x46e8 while mother board 
VGA uses 0x3C3. When Napalm is a multimedia device, this bit 


should be set to ‘1’ and the VGA subsystem should be enabled with IO 
Base + 0xC3. a apg EEA 


Disable VGA Legacy Memory/IO Decode (0=Enable, 1=Disable). 
Default is poweron strapping value FB_DATA_22. 

Use alternate VGA Config read back (0 = Enable, 1=Disable). Setting 
this bit to 0 allows the VGA to read back configuration through CRTC 
index Oxlc. 


Enable Fast Blink (test bit) (1=fast blink, 0 = normal blink). 


Use extended video shift out. Set this bit to | to disable all VGA 
memory access when video processor is shifting out data 


Decode 3c6. (test bit 


21:14 


22 Disable SGRAM refresh requests on HBLANK. When set to 1, the 
VGA does not produce memory refreshes during horizontal blanking. 
31:27 


17.13 vgaInit1 Register (0x2C) 


The vgaInit1 register contains the read and write apertures for VBE. VBE uses address 0xA0000 as an 
aperture into Napalm memory. See the section on VBE apertures in the VGA portion of this document. Bit 
20 enables sequential chain mode, a pseudo packed pixel format Bits 28:21 define lock bits that disable 
writes to specific sections of the VGA core. See the section on register locking in the VGA portion of this 
document. 
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Description Default 


Enable 0xA0000 Sequential Chain 4 mode. 

Lock Horizontal Timing - 3B4/3D4 index 0,1,2,3,4,5,la 

Lock Vertical Timing -3B4/3D4 index 6,7 (bit 7,5,3,2 and 0), 9 10, 11 
(bits[3:0]), 15,16, 1b. 

Lock H2 - 0x3B4/0x3D4 index 17, bit 2 


Lock Vsync_ - 0x3C2, bit 7. 
Lock Hsync - 0x3C2, bit 6. 
Lock Clock Select - 0x3C2, bits 3 and 2. 


17.14 2d Command Register (0x30) 


Writing to this register is the same as writing to the 2d unit’s command register. This mapping is intended 
to provide a way to initialize the SGRAM mode and special mode registers at init time. 


17.15 2d_srcBaseAddr Register (0x34) 


Writing to this register is the same as writing to the 2d unit’s sreBaseAddr register. This mapping is 
intended to provide a way to initialize the SGRAM mode and special mode registers at init time. 
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18. Frame Buffer Access 


18.1 Frame Buffer Organization 

The Napalm linear frame buffer base address is located in a separate memory base address register in PCI 
config space and occupies 128 megabytes of address space for linear access. Linear memory starts at the 
beginning of Sgram memory and finishes at the begin of tiled memory specified by the tilebase register in 
init register space. It is assumed (but not required) that VGA will use the first 256K of linear memory, and 
the desktop, video, and textures will use the remaining linear memory. 


18.2. Linear Frame Buffer Access 


Linear frame buffer access is accessed much like system, and can store the desktop, video, 3D front buffer, 
3D back buffer, 3D auxiliary buffer, and textures. Memory management is done with a true linear memory 
manager. 


Copyright © 1996-1999 3dfx Interactive, Inc. Revision 1.13 
3dfx Confidential 227 Printed 
10/24/2019 


For Internal Use Only 


SOTA sapatm Grapes Engine 


18.3. Tiled Frame Buffer Access 


Tiled frame buffer access is a rectilinear memory based on 128 byte x 32 line tiles. Tiled memory is suited 
for 3D performance, where localized access is needed. Tiled frame buffer access is done with a 
concatenation of Y and X much like the 3D linear frame buffer access, and the frame buffer access of 
SST1. Tiled frame buffer Access is defined by writing the beginning tile/page into the tiled base address 
register. When configuring tiled frame buffer access, it is best to set the global tile stride to the largest 
surface width for best memory management. Memory management for tiled memory must be done by a 
rectilinear memory manager. Access to tiled memory is shown below. It is recommended that tiled 
memory be used sparingly since this memory fragments very easily, and utilization will not be 100%. 


128 bytes x 32 lines 


36 | 37 


Tiled Memory One Tile 
1024bytes x 192lines 
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This is the PCI to XY calculation 

PCI Offset to XYbyte conversion 

Y[14:0], X[9:0] [24:0] 


The following is the page calculation that must be used in the base address registers in video, 3D, 2D, and 
texture units. 


Page equation = (X / 128) + basepage + (Y / 32) * tilestride. 


: Where X is referenced in bytes, and Y is referenced in lines. 


19. YUV Planar Access 


YUV planar memory allows the CPU to write Y, U, and V in separate regions of memory space. As Y, U, 
and V are written, they are converted into YUYV packed form, and stored in the frame buffer at the correct 
offset from the YUV base address register. The first megabyte region defines Y, where each 32-bit write, 
generates a 64-bit write on Napalm, with appropriate byte masks. The second megabyte region of YUV 
planar memory defines U space, where each 32-bit write generates two 64-bit writes with appropriate byte 
enable bits. The third region of YUV planar memory defines the V space, where each 32-bit write 
generates two 64-bit writes with appropriate byte enable bits. The conversion between planar and packed 
is described below. YUV planar space has a fixed 1024 byte stride, and a programmable destination stride. 


Copyright © 1996-1999 3dfx Interactive, Inc. Revision 1.13 
3dfx Confidential 229 Printed 
10/24/2019 


For Internal Use Only 


\ Napalm Graphics Engine 


0xC00000 ‘ 

Y 
0xD00000 > 

U 
0xE00000 > 

V 


Yo | Y1 | Y2 | Y3 | Y4 | Y5 | Y6 | Y7 
Y8 | Y9 | ¥10 | Y11 | Y12 | Y13 | Y14 | Y15 
ebelstof=fafete] aleloln| [Leo 
mello] Ells] [fe[=[s] | vovetrarsoc 
mle l=[>lel=l= 

Pabal==[ fff 

ral l=l =f =fef=f 

ePllll=tel= 


Byte# 
of1}f2]3]4]5]e6]7] 8] 9 ] to] 1 | 12 | 13 | 14 | 15 


ff ro fr rv rw fa 
Banshee Frame buffer 


fm el nm 


Copyright © 1996-1999 3dfx Interactive, Inc. Revision 1.13 
3dfx Confidential 230 Printed 
10/24/2019 


For Internal Use Only 


3 \ Napalm Graphics Engine 
20. Texture Memory Access 


! New for AVENGER: Avenger and Napalm require special flushing to be done 
surrounding texture downloads. Please see section 19.3 for more information. 


There are two methods of storing textures: (1) a single base address for all LODs within a texture or (2) 
multiple base addresses. With method (1), textures are stored as if mipmapped, even for textures 
containing only one level of detail. The largest texel map (LOD level 0) is stored first, and the others are 
packed contiguously after; for tiled space, successive LODs after 0 are stored in a manner that groups the 
LODs into a somewhat rectangular space; more on this later. When only some or one of the LOD levels 
are used, /odmin and lodmax are used to restrict texture lookup to the levels that were loaded. 


With method (2), multi-base address mode, texbaseaddr points to where LOD level 0 starts and 
texbaseaddr 1, texbaseaddr2, and texbaseaddr38 point to where LOD levels 1, 2, and 3-8 start, 
respectively. This mode provides more granularity to texture storage, and can help texture memory 
allocation. There is only one base address for mipmaps 3-8, so these are stored contiguously. 


Texture memory can be defined as either linear or tiled, as defined by the appropriate bit in texbaseaddr. 


When in single base address mode, texbaseaddr points to where the texture would start if it contained 
LOD level 0 (256x* dimension). As described above, all LODs are stored contiguously after the first. 


Addresses are generated by adding texbaseaddr and an offset that is a function of LOD, S, T, tclamps, 
tclampt, tformat, lod_aspect, lod_s_is_ wider, trexinit0, trexinitl. texbaseaddr can be set below zero, such 
that the offset to the texture wraps to a positive number. 


20.1 Writing to texture space 


Napalm provides a dedicated texture download port which is synchronized with normal rendering. Texture 
downloads done through this port are guaranteed to be processed in-order with 3d rendering, which 
alleviates the host from having to idle the chip before downloading. The texture port allows writing to 
frame buffer memory within a 4 byte aligned region. 


The Napalm texture download port occupies 2 Mbytes of PCI address space as shown in section 4.. Note 

that only writes are supported through the texture download port; reads to this area return undefined data. 

Texture space can also be accessed (in a non-synchronized fashion) through the LFB port, which provides 
both read and write accesses. 


Texture downloads through the texture download port are always texbaseaddr-relative, even in multi-base 
address mode. texbaseaddr1, texbaseaddr2, and texbaseaddr38 are unused during such downloads. 
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linear texture space 


texbaseaddr 
Frame LOD 0 
Buffer 
Space 

LOD 1 

LOD 2 

LOD 3-8 


tiled texture space 


texbaseaddr 
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linear texture space - big textures, 
single base address 


LOD 0 
Frame LOD 1 
Buffer 
Space 
LOD 2 
texbaseaddr 
LOD 3-11 
yO ee 
tiled texture space - 
big textures 
texbaseaddr 
LODs 6-11, 
compressed 
textures 


LOD7 


[40p 2 


LODs 6-11, 
non-compressesed 
textures 
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20.2 Calculating texel addresses 


When downloading textures through the texture download port, a different scheme of address translation is 
applied, depending on whether the texture memory space is in tiled or linear space. 


texture download = address definition 


In linear texture space, texels are packed totally linearly starting from the base address. Any texel can be 
referenced by using this equation: 

Ifb_texel_addr[23:0] = ({texbaseaddr[1],texbaseaddr[24:4], 4’h0} + lod_offset[17:0] + s + 
t*lod_width); 

(lod_offset[17:0] is the offset of the given mipmap level from the base address) 


Linear 


Example: Calculate the physical address for texel 30,45 in LOD 2, with aspect=1x1, 1 6bpp texels, 
and a texbaseaddr = 0x200000; assume that LODs 0,1 exist and that we are non-multi base addr.: 


Ifb_phys_addr[23:0] = (0x200000 + size_lod_0+ size_lod_1+(30+ 45 * 64)*2); 
= (0x200000 + (256*256*2) + (128*128*2) + (30+45*64)*2); 
= 0x2296bc; 

offset[20:0] =I|fb_phys_addr[20:0] = 0x296bc; 


You could download to this texel by setting texbaseaddr to 0x200000 and then writing to the 
lower (s is even) 16 bits of this pci address: 
pcei_addr[23:0] = (0x600000 + 0x296bc) = 0x6296bc; 

Tiled 


texture download aperture address definitions - small textures 


16-bit textures 


21 20 17.16 9 8 2 
| - | lod[3:0] [7:0] s[7:1] Ea 
8-bit textures 

20 19 16 15 8 7 2 10 

|- | lod[3:0] T[7:0] s[7:2] | 

4-bit compressed textures (TDFX. DXT1) 

219 18 15 4 = 9 8 4 3 2 10 
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In tiled texture space, LOD levels can be viewed as rectangles of texture space, which are packed edge-to- 
edge as described in the figure above. The aim of the packing is to make the footprint of a full texture 
rectangular. 
Example: Calculate the physical address for texel 30,45 in LOD 2, with aspect=1x1, 1 6bpp texels, 
and a texbaseaddr = 0x200000; assume that LODs 0,1 exist and that we are non-multi base addr: 
pci_download_addr[20:0] = (0x200000 + {4’h2, 8’h2d, 7’hOf, 2’b00}); 


= 0x245a3c; 


The pci write would have to write to only the lower (s is even) two bytes of the address to insure that only 
texel 30, 45 is written. 


Napalm will allow texture memory to be loaded from two different address spaces, the first being linear 
frame buffer space, and the second being the SST1 texture download port. 
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20.3. Maintaining cache coherency in Napalm 


Napalm has two TMUs, and thus two texture caches. Napalm needs explicit software intervention to 
maintain cache coherency. 


The rules to maintain texture cache coherency with the frame buffer texture memory are as follows: 


1. precede any group of texture downloads with a “pixel” flush of the 3d pipeline; ie. Issue a 2D 
NOPcmd to flush all pixels from the 3d section. 


2. Next, perform texture downloads 


3. Then, explicitly flush the texture cache, by changing the value of texBaseAdadr. It is 
recommended that software write the inverse of the texBaseAddr to the register, and then re- 
write texBaseAddr with a correct value. 


4. Now, again, do a “pixel” flush with a 2d NOPcmd. This will force all texture downloads to 
complete. 


5. Now, procede with new rendering commands. 


The following is a list of programming guidelines which are detailed elsewhere but may have been 
overlooked or misunderstood: 


21.1 Memory Accesses 


All Memory accesses to Napalm registers must be 32-bit word accesses only. Linear frame buffer accesses 
may be 32-bit or 16-bit accesses, depending upon the linear frame buffer access format specified in 
IfbMode. Byte(8-bit) accesses are only allowed to Napalm linear frame buffer. 


21.2 Determining Napalm Idle Condition 


After certain Napalm operations, and specifically after linear frame buffer acceses, there exists a potential 
deadlock condition between internal Napalm state machines which is manifest when determining if the 
Napalm subsystem is idle. To avoid this problem, always issue a NOP command before reading the status 
register when polling on the Napalm busy bit. Also, to avoid asynchronous boundary conditions when 
determing the idle status, always read Napalm inactive in status three times. A sample code segment for 
determining Napalm idle status is as follows: 


7 HP RR ee he hee 6 2 ae 2 6 Ae 6 2 2 46 22 he 2 2 6 2A 62 fe Ae 2 246 Ae Ae 2 fe he 2 fe 2 he 2c 2 fe ie 2 ik 


* SST_IDLE: 

* returns 0 if SST is not idle 

* returns | if SST is idle 

Hee 6 2 fe Ae 2 46 2A 6 2 Ae 2 46 2A 2 2 Ae 2 46 22 He 2 6 Ae 2 fe 2A 2 he 2 fe Ae 2 26 2A e262 6 Ae 2 fe 2 ae he 2 fe 2 2 62 / 


SST_IDLE() 
{ 


ulong j, 1; 


// Make sure SST state machines are idle 
PCIL_MEM_WR(NOPCMD, 0x0); 
i=0; 
while(1) { 
j= PCLMEM_RD(STATUS); 
ifG & SST_BUSY) 
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return(0); 
else 
i++; 
if(i > 3) 
return(1); 
} 


} 


21.3 Triangle Subpixel Correction 


Triangle subpixel correction is performed in the on-chip triangle setup unit of Napalm. When subpixel 
correction is enabled (fbzColorPath(26)=1), the incoming starting color, depth, and texture coordinate 
parameters are all corrected for non-integer aligned starting triangle <x,y> coordinates. The subpixel 
correction in the triangle setup unit is performed as the starting color, depth, and texture coordinate 
parameters are read from the PCI FIFO. As a result, the exact data sent from the host CPU is changed to 
account for subpixel alignments. Ifa triangle is rendered with subpixel correction enabled, all subsequent 
triangles must resend starting color, depth, and texture coordinate parameters, otherwise the last triangle’s 
subpixel corrected starting parameters are subpixel corrected (again!), and incorrect results are generated. 


21.4 32 BPP Rendering 


e Enabling the destination alpha buffer. When 32 BPP rendering is enabled (renderMode[1:0]=0x2), 
the destination alpha buffer is always automatically enabled, and fbzMode bit(18) (typically used to 
select the auxiliary buffer as a destination alpha buffer) must be cleared. 

e =©Write mask for the color buffers. When 32 BPP rendering is enabled, fbzMode bit(9) must be set to 
allow writes to any of the individual color planes, and renderMode bits (19:17) are used to control 
writes to each individual color plane. Note that clearing fbzMode bit(9) causes all color planes to not 
be written, regardless of the individual settings of renderMode bits(19:17). 

e Write mask for the depth and destination alpha buffers. When 32 BPP rendering is enabled, 
fbzMode bit(10) is used to enable writes to the depth buffer, and renderMode bit(20) is used to 
enable writes to the alpha buffer. 

e Write mask for the stencil buffer. When 32 BPP rendering is enabled, stencilMode bits(23:16) 
enables writes to the stencil buffer. During triangle rendering, individual bit write enable access is 
supported by setting corresponding bits in stencilMode bits(23:16). However, for the FASTFILL 
command, individual bit write enable control is not supported, and setting any bit in stencilMode 
bits(23:16) enables the 8-bit value of the stencil reference value (stencilMode bits(7:0)) to be stored in 
the stencil buffer. 


21.5 15 BPP Rendering 


e Enabling the destination alpha buffer. When 15 BPP rendering is enabled (renderMode[1:0]=0x1), 
the 1-bit destination alpha buffer is automatically enabled, and fbzMode bit(18) (typically used to 
select the auxiliary buffer as a destination alpha buffer) must be cleared. 

e Write mask for the destination alpha buffer. When 15 BPP rendering is enabled, there is no 
capability to disable writes to the destination alpha buffer separately from the color buffers. Writes to 
the destination alpha buffer are simultaneously enabled when writes to the RGB buffers are enabled 
(fbzMode bit(9)=1). 

e Clearing the destination alpha buffer. When 15 BPP rendering is enabled, there is no capability to 
selectively clear the 1-bit destination alpha buffer using the FASTFILL command. When using the 
FASTFILL command, both color and the 1-bit alpha information are written simultaneously. The only 
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way to selectively clear the destination alpha buffer is to render a triangle and setup the alpha blending 


modes to not modify the RGB channels and only modify the alpha channel as desired. Also note that 
for the FASTFILL command, the value of the 1-bit alpha channel stored into the frame buffer is 
calculated as specified in the fastfillCMD register description. 


21.6 2 Pixel-per-clock Rendering 


Enabling. 2 pixel-per-clock rendering is enabed by setting combineMode[29]. Note that 2 pixel-per- 
clock rendering may only be enabled for single-textured triangles. It is required that software clear 
combineMode[29] when dual-texturing is desired. 

Disabling. When changing state from 2 pixel-per-clock rendering to single pixel-per-clock rendering, 
12 NOP commands must be send prior to the write which clears combineMode[29]. The 12 NOP 
commands should be directed using the chip field only at the TMUs, and not the FBI chip. See the 
combineMode register description for more information. 

TMU State updates. When 2 pixel-per-clock rendering is enabled and miscInit1[18]=0, then any 
write to either TMU unit will be received by both (i.e. every write to a given TMU will be broadcast to 
both, regardless of the chip field value). This functionality can be disabled by setting 
miscInit1[18]=1. 

TMU State. In 2 pixel-per-clock rendering, each texture unit is used to generate pixels in scanline 
“bands” (as controlled by renderMode[24:22]). As a result, it is required that each TMU’s state be 
identical (and each setup for single texture-per-pixel rendering operation) in order to have correct 
results. Software is required to ensure that each TMU state is identical and setup for single texturing 
operation. 

Performance. 2 pixel-per-clock rendering will only increase performance when extra memory 
bandwidth is available. When extra memory bandwidth is not available, then 2 pixel-per-clock 
rendering should be disabled, as enabling 2 pixel-per-clock rendering may actually be slower than 
single pixel-per-clock rendering. As a result, 2 pixel-per-clock rendering should only be enabled for 
15/16 bpp rendering modes, and should not be enabled for 32bpp rendering modes (except possibly 
for some very low resolutions). Also, for very high resolutions or high refresh rates, 2 pixel-per-clock 
rendering may need to be disabled even for 15/16 bpp rendering modes. This performance tweaking 
will need to be performed once actual silicon is available. 

Performance Tweaking. When 2 pixel-per-clock rendering is enabled, the number of scanlines that 
are rendered by each texture unit is controlled by renderMode[24:22]. It is possible that different 
values may be required to tweak individual games or benchmarks. This performance tweaking will 
need to be performed once actual silicon is available. 


21.7 Scanline Interleaving 


Y-origin swapping. Y-origin swapping must be used carefully when SLI is enabled. When Y-origin 
swapping is enabled, the Y coordinate is subtracted from a constant to “flip” the origin from the upper- 
left to the lower-left corner of the screen. The problem this presents for SLI is that this “flipping” of 
the Y-coordinate causes scanlines which were not “owned” by a given chip to possibly be “owned” 
and visa-versa (because the Y-coordinate is changing). To address this issue, software must also “flip” 
the sli_comparemask fields in the sliCtrl register in order to compensate for the Y-coordinate 
“flipping.” Also, the constant value used to subtract the Y-coordinate from to accomplish this 
“flipping” will need to be divided by the number of chips in an SLI configuration due to the fact that 
the Y-address “munging” performed to pack a given chip’s memory occurs before the Y-coordinate 
“flipping.” 

Access to a slave’s local memBaseAddr1 address space. When SLI and snooping are enabled, a 
slave’s local memBaseAddr1 address space is unavailable. Do not attempt to access via reads or 
writes a slave’s local memBaseAddr1 address space when SLI/snooping is enabled. 

Performance Tweaking. When SLI is enabled, each chip “owns” a programmable number of 
scanline “bands.” The “band” height is set via the fields in the sliCtrl register. Performance tweaking 
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will be required once silicon is available to program the optimal “band” height for performance. Note 
that different “band” heights may be required for different applications or benchmarks. 

e Dithering. Dithering does not work properly when the “band” height is 1. If dithering is desired, do 
not use a “band” height of 1. 

e PCI Wait States. When snooping is enabled, 2 PCI read wait states (pciInit0[8]=1) and 1 PCI write 
wait state (pciInitO[9]=1) must be used. Running with more aggressive wait state settings will break 
the snooping functionality. 

e FASTFILL command. The FASTFILL command has several problems when used in SLI 
configuration: (1) performance, and (2) compatibility when using SGRAM memories and filling tiled 
surfaces. See the fastfillCMD register description for more information. It is recommended that 
software use the 2D BLT engine to fill surfaces when in an SLI configuration. 


21.8 Miscellaneous Control 


e New texture and color combine operation. When the new texture and color combine unit 
functionality of Napalm is required, combineMode[30] must be set to disable the use of the 
chromaKey and chromaRange for the texture color/chroma substitution function. Note that when the 
new texture and color combine unit functionality is enabled that the texture color/chroma substitution 
functionality is not available. 

e Guardband clipping coordinates. When guardband clipping is enabled (renderMode[21]=1), the 
left and right guardband clipping planes (as defined in clipLeftRight1) must be aligned to even pixels. 

e Performance tweaking using triangle iterators column band control. The triangle iterators’ 
column band control is specified in fbzColorPath[3 1:30]. This controls how many pixels to render 
horizontally before stepping down (to run in “legacy” performance mode set this field to 0x0 for 8- 
wide column rendering). Once silicon is available, performance tweaking will need to be performed to 
determine the optimal values for fbzColorpath[31:30]. Note that different values may need to be 
required for different applications and/or benchmarks. 

e PCI Read Wait States. 2 PCI read wait states should always be used (pcilnitO[8]=1). A timing race 
condition exists when only a single PCI read wait state is used which may cause incorrect PCI read 
data being returned. 

e Frame Buffer Command FIFO. If the command FIFO is being maintained in frame buffer memory, 
it must be located in the lower 16 MBytes of memory. Placing the command FIFO in frame buffer 
memory above 16 MBytes is not supported. 

e Packet type 6. Packet type 6 is pretty much broken with AGP command fifos. Packet type 6 will 
work fine for frame buffer command fifos. 

e Dither rotation. The hardware has a bug which does not allow dither rotation to be used for 
FASTFILL commands or 3D LFBs. Dither rotation must be disabled by clearing renderMode[25] 
when performing FASTFILLs or 3D LFBs. 


22. Accessing the ROM 


22.1 ROM Configuration 


Napalm supports either 32K or 64K of ROM space. The size of the ROM is determined during at power 
up by an external strapping pin (see the section of strapping pins for more information). 


Directly after reset, PCI subsystem and subvendor information are loaded from the next to last four bytes 
of ROM memory. The last four bytes are reserved for checksum information. 
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22.2 ROMReads 


Napalm supports reads to the ROM through the normal PCI mechanism. In order to read the ROM, set the 
romBaseAddr register bit 0 to 1. ROM accesses are then possible at the address indicated by the most 
significant bits of remBaseAddr. ROM reads can have any combination of byte enables asserted. Since 
the ROM is a byte device however, asserting multilble byte enables at once will cause the transfer of data 
on the PCI bus to be slow. 


It is important to note the ROM shares the bus with VMI and TV out. During ROM accesses, data on these 
ports will become ROM information providing what may appear to be bad pixels on the display. This is 
normal; however, if it is know that ROM accesses are to occur, it is recommended that VMI or TV out be 
disabled prior to ROM access. 


22.3 ROM Writes 


Napalm also supports a mechanism for programming flash ROMs when they are available. The model that 
Napalm uses is that of a 32K/64K EEPROM that allows programming by polling the EEPROM. 


By default, Napalm will not respond to writes pointed at by romBaseAddr. By enabling bit 0 of 
romBaseAddr and also setting bit 4 of miscInit1, writes pointed at by romBaseAddr will be processed. 


Typically, programmable ROMs have a sequence of write events that must occur to be placed in the 
‘Program Mode’. Then either a single or multiple writes occur (depending of the ROM used) to fill in data. 
Finally, the ROM is polled via ROM reads, to confirm the write is complete. This process is repeated until 
the ROM is completely written. 


For more information on how to program a specific ROM, see its data sheet or application notes. 


23. Power on Strapping Pins 


Power On Description 
Strapping Pin 
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| 2s 


MI ADDR_1 | mctl_short_power_on (0-normal power-on, 1=for RTL 
simulation only) 

MI ADDR_0 | re-map IDSEL (0=IDSEL is IDSEL, 1= PCI_AD_16 is 
IDSEL) 

MI DATA 7 Disable PCI IRQ register (0=Enable, 1 = Disable). 

MI DATA 6 


MI DATA 5 SGRAM number of chips 


MI DATA 4 PCI Device Type (0= VGA, 1= Multimedia) 
MI DATA 3 AGP Enable (0=Disabled, 1 = Enabled). 


a r 


< 


MI DATA 1 


MI DATA 0 PCI Fast Device. (O=DEVSEL Medium, != DEVSEL Fast) 


< 


Power On Description 
Strapping Pin 


mi po 
| ECC‘ 
[chipIDbi(4) 
[chipIDbi(3) 
iE | 
Bo 
i Pe 
| ed 
[ pei_membasel alloc bit(2) 
| pei_membasel_allocbit(1) 
E po 
i Seay 
[ pci_membase0 alloc bit(2) 
| pei_membaseO alloc bit(1) 
Bl fe) 
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24. Signal Strapping 


[romwen 


i 


25. Monitor Sense 


Napalm Supports the ability to detect a monitor, as well as determine if the monitor is color or 
monochrome. This is accomplished with an internal MSENSE signal. MSENSE becomes active when a 
current is driven through either the RED, GREEN or BLUE DAC outputs. If a monochrome monitor is 
present, only the GREEN output will cause MSENSE to become active. MSENSE is readable through IO 
0x3c2, bit 4. 


26. Data Formats 
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Signal Pixel Sequence for 4 : 2 : 2 
a 
is [es [ve [ve fs ee [ve [ves ee [ve [ve Ps ve [oe 
Y4 | Y4 | Y4 


Pl 1 
cei Fa a a a 


27.  Issues/Requirements 


27.1 PCI/AGP requirements 


ee needs to decoded for VGA 
Add 8bit xfers 
Add AGP bus master. 
Add interrupt logic 
Incorporate VGA information into PCI CFG space 
0 wait state palette writes 


27.2. 2D requirements (SST-G) 


e Binary / Ternary Rasterops 
| age patterns 

Blits, src stride and dst stride. Stretch blits 

1:n color expansion (text) 

Ifb byte writes 

YUV - RGB color space conversion 

SGRAM fast fill 

Lines / Tri’s and rects use FBI fastfill / triangles. 
© support 8bit, 16bit, and 24bit color formats 
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27.3 Video / Monitor requirements 


Anti Dither logic? 
Extra Gamma logic for 8bit desktop. 
135 Mhz DAC (1280x1024 resolution) 
Triple 256x8 lookup tables 
© Hardware Cursor 
64x64x2 windows compatible 
Cursor image data stored in offscreen memory 
single 128 bit internal cache stores current scanline cursor information 
Cursor scanline is read during active hsync 
registers required curXpos, curYpos, curCtrl, CurC0, CurC1 
Cursor registers live in the pci domain 
X compatibility? 
° Window ID information for entire screen is RLE encoded and stored in offscreen memory 


iS window can be single or double buffered 


Each window can be YUV or 16 bit RGB format color 


Bilinear filtering in both X and Y support for window magnification 


Decimation (point sampling) for window minification 
27.4 VGA Controller requirements 


Windows desktop is a special case (wid 0) and is single buffered, palettized or 16/24 RGB only 
4 or 8 or 16 unique window Ids supported 

wid requires widRowStart(buffer0), widRowStart(buffer1), widRowStart(buffer2), widCtrl, 
widXStart 

widY Start, widXSize, widY Size, wid_dudx, wid_dvdy 

Single 8 bit overlay with transparency? 

Pixel replication / line duplication for lower rez support (320x240) 

Software control of vsync/hsync 

DDC compliance 

vsyne interrupts 

Genlock to external video source (tristate hsync/vsync controls) 

invert hsync/vsyne 

LCD shutter glasses support 

Interlaced video output support 

Filtered interlaced video output support? 

Video in? S3 scenic Highway? VESA video In Port 

Support for VBE2.X LFB modes / and functions? 


Palette snooping 


i control in PCI space. 
Relocatable VGA extension rom 


eeds to be disabled. 
Supports all VGA Ifb modes including 4 bit planar. 


27. 


n 


Memory Controller requirements 


aes Texture, 2D, 3D(color & auxillary), Video 

128 bit wide 
Support SGRAM write per bit, block write. Continued support for EDO 
3D in a window 
Bank / port swizzling for 2D and texture performance 
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Texture cache. 


bitrate between VGA, 2D, FBI, TMU 
Packed 24 bit mode in addition to 8 bit and 16 bit pixel formats 
Tiled, interleaved bank memory organization for optimal performance. 


27.6 Configuration Eeprom 


Serial Eeprom for storing video configuration 
Support for VGA rom expansion. 


7 Dac requirements 


Must support pixel format switching per pixel 
Must support triple CLUT 
Must support >= 135Mhz pixel frequency. 
_ support 8 bit psuedo color lookup. 
Must reset to 8 bit psuedo color lookup 
VGA wants 18 bit clut writes. 
Support monitor sensing for VGA 
SAR at clut output for diagnostic/testability. 
Dac CLUT Addr (24bits) DAC CLUT data (18-24bits) 


8 PLL requirements 


Must support video frequencies >= 135Mhz 

Graphics clock must reset >= SOMhz < 75Mhz 

Video clock must reset = = 25.175 (VGA) 

PLL must support the following 2 frequencies in HW 25.175, and 28.322 Mhz for VGA 
Requires M/N register pair per PLL. M is 8 bits, N is 8 bits. (14.318Mhz source) 
PLL output test port. 


27.9 Overall requirements 


apalm must reset to VGA mode with no software help. 
Power down support? VBE 2.0 APM document? 


27.10 PC97 requirements 


Primary graphics adapter does not use legacy bus 

Support for NTSC/PAL TV 

Support for multiple adapters / monitors 

Minimum resolution 1024x768x16 

Graphics operations use relocatable registers only 

Graphics adapter operates normally with default VGA mode driver ( 4 bit planar ) 
ergonomic timing rates per current VESA specification: 75Hz 

Color ordering rgb most significant is red least significant is blue, bpp 15, 16, 24, 32 
Downloadable RAMDAC entries to support image color matching (gamma correction) 
Support of DDC 2.0 monitor detection 

VGA must be able to configure its bios base address to c000 

Direct frame buffer access can be performed at any time 

If supported, low resolution modes are 320x200, 320x240, 400x300, 512x384, and 640x400 all 
in 8 or 16 bit depths 

Hardware arithmetic stretching 
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Transparent blter 
Perform double buffering with no tearing 

Current scan line of refresh 

Programmable blter stride (better memory management) 
e YUV off-screen surfaces for color space conversion 


27.11 Testability requirements 

Ram disable, Pll disable, blank for IDDQ. 

External access to pixel data at DAC input to verify DAC and pixel logic. 
Clock / PLL bypass for the tester. (2 pins, 1 for video, | for grx) 

Partial scan for coverage? 


28. Revision History 


e YUV off-screen surfaces for color space conversion 


ll 


© Split spec into multiple documents 
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1.01 
e =©Added bit(3) of vidProcCfg register to control whether to use the alpha bit in the 1555 desktop 
mode for chroma-keying 
e Added new desktop pixel format RGB1555 undithered in vidProcCfg bits(20:18) 
e Added new overlay pixel formats RGB1555 dithered, RGB1555 undithered, and RGB32 
undithered in vidProcCfg bits(23:21) 
e Changed dramInit1[12] to “reserved”. Dither pass-thru mode is no longer supported in order to 
support 1555 ARGB and 8888 ARGB 3D rendering modes. 
e Added renderMode register (32-bit) 
e Added renderMode[1:0] to control 3D rendering mode (16BPP 565 RGB, 15BPP 1555 ARGB, or 
32BPP 8888 ARGB modes) 
e Added renderMode[14:2] to allow dynamic setting of the Y-Origin subtraction value 
1.02 
e Now zaColor[23:16] are used for upper byte for 24-bit depth data 
e Added renderMode[16:15] to control the behavior of the 1-bit alpha channel when running in 15 
BPP rendering mode 
e Added section in “Programming Caveats” for 32 BPP and 15 BPP rendering. 
e Added description of the alpha channel alpha blending modes for 15, 16, and 32 BPP rendering 
modes in the alphaMode register description 
e Added separate writes for R, G, B, and Alpha for 32 BPP rendering mode in renderMode[20:17] 
e Added support for stenciling with the creation of the stencilMode and stencilOp registers 
e Added fbiStenciltestFail register 
e Added linear write buffer write mode 0x8, which is a 24-bit depth value 
e Added explanation under IfbMode register of how linear frame buffer writes work when running 
in 32BPP rendering mode 
e Documented in the zaColor register description the different use of zaColor depending on 
whether 32BPP rendering mode is selected 
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Modified description of CMDFIFO packet types 5 and 6 to allow 27-bit address specification in 
word | for 128 MByte address accessibility (need 128 MByte address accessibility to be able to 
address Ifb space in both tiled and linear, each being up to 64 Mbytes in size) 

Increased width of agpGraphicsAddress register to 27 bits for 128 MByte address accessibility 
(need 128 MByte address accessibility to be able to address Ifb space in both tiled and linear, each 
being up to 64 Mbytes in size) 

Increased width of yuavBaseAddress register to 26 bits for 64 MByte address accessibility 
Changed default definition of PCI configuration registers memBaseAddr0 and memBaseAddr1. 
Both memory base address registers now default to allocate 128 MBytes of memory space each 
from the system BIOS. 

Added bits in lfhb MemoryConfig register to be able to select between tiled and linear address 
space for 128 Mbytes of addressibility (tiled and linear address space, each up to 64 MBytes) 
Added bits in dramInit0 to select 8, 16, 32, and 64 Mbit sgrams/sdrams, and also select between 
2 and 4 internal bank sgrams/sdrams. 

Added bits(24:23) in vgaInit0 to allow full 64 MByte addressibility for the starting page of VGA. 
Added bits(30:29) in vgaInit1 to allow full 64 MByte addressibility for VBE write and read 
apertures. 

Updated “Calculating texel addresses” section 

Added power on strapping pin definition for VMI_ ADDR _ 3 (PLL bypass) 
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e Adding additional power-on strapping pins 
e Added section of additional pins for Napalm (as compared to Avenger) 


e Added further description of using the FASTFILL command when running in 32 BPP rendering 
mode under the fastfillICMD register description and in the “Command Descriptions” section 

e = Clarifications to 32-bit depth linear frame buffer write format (format 8) in IfbMode register 
description 

e Added description of the stencil write mask capabilities in the “Programming Caveats” section 

e Added new alpha blending functions A SAMECOLOR and AOM_SAMECOLOR to allow 
src*sre and dst*dst blending calculations 


e Changed ordering of stencil operations in the stencilOp register to match D3D spec 

e Added poweron strapping values for dramInit0[29:28] 

e Changed alpha-channel alpha blending factors for 15 BPP rendering mode to only be 0 and | (just 
like 16 BPP rendering mode) 

e Added combineMode register 

e Updated CCU and ACU block diagrams under the fbzColorPath register description 

e Added description in chromaRange and chromaKey for specifying constant colors into the 
texture units 

e Clarified behavior of linear frame buffer writes which bypass the pixel pipeline when running in 
15 BPP rendering mode which do not contain alpha information in the IfbMode register 
desciption 

e Added SLI support in renderMode[3 1:29], commandExtra[3 1:29], and added new PCI config 
register cfgSliCtrl[3 1:0] 


1.06 
e Updated CCU, ACU, Texture CCU, and Texture ACU diagrams 


1.07 

e Clarified FASTFILL operation when running in 15 BPP rendering mode and where the single bit 
alpha value comes from. 

e Clarified description of cc_outshift and cca_outshift in combineMode 

e Added reversal of operations and subtraction capabilities for alpha blenders in the fogMode and 
alphaMode register descriptions 


e Added miscinit1[18] to disable broadcasting of TMU writes to both TMUs when 2 pixel-per- 
clock mode is enabled 

e Cleaned up naming conventions of clipping registers 

e Added renderMode[21] to control triangle iterators guardband clipping, and added description of 
how to use guardband clipping in the clipTopBottom1 register description 


e Added 2 pixel-per-clock operation in combineMode and renderMode registers 

e Added triangle column band selection in fbzColorPath[3 1:30] 

e Added sliCtrl register 

e Added aaCtrl register and description of Secondary rendering buffers in colBufferAddr and 

auxBufferAddr register descriptions 
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e Added chipMask register 
e Added chipID field in power on strapping values and in efgSliCtrl[3 1:28] 
e Cleaned up status register bit descriptions 


1.09 

e Removed renderMode[3 1:29] (old SLI support) 

e Moved mctl_dram_numbanks field in dramInit0 to dramInit0[30], with default value controlled 
by reset value of TV_ DATA 4 

e Added 1 bit to sgram type field in dramInit0[29:27] with default value controlled by reset value 
of {TV_DATA_ 3, TV_DATA 2, VMI DATA 6} 

e Removed byte swizzling and word swapping of 2D, 3D, and non-modal LFB space as defined in 
miscInit0[3:2] and miscInit0[3 1:30] 

e Added address-based byte swizzling and word swapping for memBaseAddr1 as controlled by 

miscInit1 [22:20] 

Added strapInfol and lots of new poweron strapping values 

Added some placeholders in the “Programming Caveats” section 

Added the new 4 Meg TMU0 texture download aperture to the memory map (Section 7). 

Added texture format 15 (8888 ARGB) to the description of tformat (Section 11.72). 

Modified the texture download description (Section 23). 

Expanded the TMU0 texture download aperture in the memory map to 64 Meg (Section 7). 

Added the big bit to the LOD register. (Section 11.73). 

Modified the texture download description (Section 23). 

Modified the packet 5 description (Section 12.1.8). 

Added the tcompressed bit to the textureMode register. (Section 11.73). 

Modified the texture download description (Section 23). 


° 
_ 
—) 


e Added configuration registers cefgPciDecode, cfgVideoCtrl0, cfgVideoCtrl1, cfg VideoCtrl2, 
cfgSliLfbCtrl, cfgSliAaTiledAperture, and cfgAaLfbCtrl 

e Added configuration registers agpTestCtrl, agpTestData0, agpTestDatal, agpTestData2, and 

agpTestData3 

Moved around poweron strapping bits 

Updated description of memBaseAddr0 and memBaseAddr1 

VBE write aperture bits(11:10) now in vgaInit1[30:29] 

VBE read aperture bits(11:10) now in vgaInit0[26:25] 

Changed poweron value of vgaInit0[0] to be poweron strap value FB DATA 21, and poweron 

value of vgaInit0[9:8] to be poweron strap value {FB DATA 22, FB DATA 22}. Added 

vga_valid_ disable in vgaInit0[3] with default poweron strap value FB DATA 23. 

Added note in peilnit0 that wait state bits (9:8) must both be set when bus snooping is enabled 

Added dither rotate functionality in renderMode[25] and fogMode[ 19:12] 

Added more bits in cmdFifoThresh 

Misc. changes to efgInitEnable 

Added leftDesktopBuf register 

Added swapbufferCMD bit(10) to enable desktop swaps 

Added vidCurrDesktopStartAddr, read by writing bit(31) of IO address Oxfc and then reading 

IO address Oxfc 

e Added aaCtrl[30] to disable triangles to the primary rendering buffers when anti-aliasing is 
enabled 

e Cleaned up some out-of-date info in the register bit field descriptions of fbzColorPath and 
textureMode 

e Updated “Programming Tips & Caveats” section 
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e Renamed Avenger+ to Napalm 


1.11 

e Added Signal Strapping section 

e Added bits(24:20) in tmuGbelnit to control clock delay settings for aa_clk signal 

e efgVideoCtrl0 bits(23:20) are now reserved (old aa_clk_del_adj{3:0] field) 

e Added efgSliAaMisc register 

Added aa_Ifb_rd_format field to cefgAaLfbCtrl 

miscInit1 [31:29] now control pd pin 

Added pci_device_id for poweron value of TV_DATA 6 

Changed IfbMemoryConfig to now consist of IfbMemoryTileCtrl and IfoMemoryTileCompare 
Renamed efgSliAaTiledAperture to cfgAaDepthBufferAperture and added window comparison 
for the depth buffer for AA reads 

Added miscInit0[31] to enable queued VMI host port writes 

Added IfbMode write format 9 for queued VMI host port writes 

Added miscInit0[30] to control access to the vip2vmiCtrl register 

Added vip2vmiCtrl register description 

Updated devicelID register to reflect ability to choose ID values of {6,7,8,9} 

Fixed textual description of stencilOp register 

Added problems with command fifo packet type 6 when using an AGP command fifo to the 
“Programming Caveats” section 


1 

e Added description of vip2vmi_intr_field and vip2vmi_intr_type in vip2vmiCtrl register 

e Fixed width of agpGraphicsStride in the AGP/CMD register map 

e = Clarified agpReqSize register description 

e Fixed typo in chipMask register description. 32 chips are supported. 

e Fixed typo in clipLeftRight1/ClipTopBottom1 register description 

e Changed default value of Revision_ID register to be 0x1 

e Fixed “Ifb” (should be “Isb”’) typo in the Device_ID register description 

e Added note in CMDFIFO Packet Type 6 which states that Packet Type 6 may only be used with 
frame buffer command FIFOs 

e Updated CMDFIFO Packet Types | and 4 with proper chip field bits 

e Added clarification in description of chipMask register that the chipMask register can only 
disable writes to the 3D registers and 3D LFB space 

e Fixed typo in aaCtrl register description -- secondary buffer offsets are controlled by aaCtrl 
bits(27:14) 

e Added hotplug interrupts in intrCtrl register 


e Updated Programming Tips and combineMode register description to document workaround 
required when switching from 2 pixel-per-clock rendering mode to single pixel-per-clock 
rendering mode 

e Documented dither rotation capabilities in the renderMode register description. Updated 
fogMode register to clarify dither rotation capabilities. 

e Document bug with dither rotation when using FASTFILL commands and 3D LFBs in the 
Programming Tips section and the renderMode register description 

e Fixed 3D register description table so that nop CMD, combineMode, sliCtrl, and aaCtrl are now 
properly labeled as maskable registers 

e Added note about bugs in the FASTFILL command when in SLI 

e Added aaCtrl bit(31) to enable auto reset of the cmd_repeat fifo (must be set if the triangle setup 
unit is performing backface culling...) 
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